tag: ai
2026-03-05
HexStrike AI
AI-powered security toolkit integrating MCP for automated vulnerability scanning and exploitation assistance.nCPU
A neural network implemented as a CPU architecture — neurons as registers, synapses as instructions.2026-03-06
"Current LLMs are better vulnerability researchers than I am"
Nicholas Carlini at [un]prompted makes the startling claim that current LLMs are better vulnerability researchers than he is.Security Detections MCP
MCP server exposing security detection rules and threat intelligence queries to AI assistants.2026-03-08
Agents of Chaos
Exploratory red-teaming study of autonomous language-model-powered agents in a live lab environment, documenting failures like unauthorized actions, sensitive data disclosure, destructive behavior, spoofing, and partial system takeover.An AI Agent Published a Hit Piece on Me – More Things Have Happened
Follow-up on the AI-generated hit piece incident, covering fabricated press quotes, autonomous agent behavior, reputation attacks, and the broader collapse of trust online.AI Made Writing Code Easier. It Made Being an Engineer Harder.
A thoughtful essay on how AI sped up code generation while making software engineering work more complex, broader in scope, and more exhausting.Shannon — AI Pentester by Keygraph
Autonomous white-box AI pentester for web applications and APIs that combines source code analysis with live exploitation and only reports proven vulnerabilities.Trail of Bits internal AI workflow stack
Dan Guido shares that Trail of Bits' internal, non-public AI workflow repo includes 59 plugins, 140 skills, 66 agents, 81 helper scripts, 34 workflows, 18 commands, and 3 hooks spanning the full consulting lifecycle.Trail of Bits Skills Marketplace
Claude Code plugin marketplace from Trail of Bits providing skills to enhance AI-assisted security analysis, testing, and development workflows.2026-03-09
Android Reverse Engineering & API Extraction — Claude Code skill
A Claude Code skill that decompiles Android APK/XAPK/JAR/AAR files and extracts the HTTP APIs used by the app.Awesome Opencode
A curated list of plugins, themes, agents, projects, and resources for Opencode, the terminal AI coding agent built by the team at Anomaly.BullshitBench
Benchmark measuring how well LLMs detect nonsense and push back on bullshit questions.Cortical Labs — 200k brain cells playing Doom
Full video from Cortical Labs explaining how they put 200,000 brain cells onto a silicon chip and had it play Doom using electrode stimulation and neural spike interpretation.OBLITERATUS
Open-source toolkit for analyzing and removing refusal behaviors from LLMs using abliteration techniques.OpenAgents Control (OAC)
AI agent framework for plan-first development workflows with approval-based execution, shared coding patterns, and repeatable team-ready results built on OpenCode.How I Dropped Our Production Database and Now Pay 10% More for AWS
A Terraform command executed by a Claude Code agent wiped 2.5 years of production data for DataTalks.Club. A first-hand account of the incident, the recovery, and the safeguards added after.PulseMCP
A hub for exploring the Model Context Protocol ecosystem — servers, clients, use cases, tools, and a weekly newsletter covering what's new in MCP.Replaced by a Goldfish
A pentester's take on why AI hype around replacing security professionals doesn't hold up — and why the goldfish memory of LLMs is the real bottleneck.VulHunt Community Edition
Vulnerability hunting framework by Binarly's research team, built on top of the BIAS binary analysis system with MCP integration.2026-03-10
autoresearch
Karpathy's experiment giving an AI agent a single-GPU LLM training setup and letting it run autonomous overnight research — it modifies code, trains for 5 minutes, checks if the result improved, and repeats.Google Workspace CLI (gws)
One CLI for all of Google Workspace — Drive, Gmail, Calendar, Sheets, and more. Dynamically built from Google's own Discovery Service at runtime, with structured JSON output and 100+ bundled AI agent skills."SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration"
A new repository-level benchmark built around the Continuous Integration loop. Instead of static one-shot bug fixes (à la SWE-bench), SWE-CI evaluates whether AI agents can sustain long-term code quality through 100 real-world tasks spanning an average of 233 days and 71 consecutive commits each.T3 Code
Minimal web GUI and desktop app for coding agents — currently Codex-first, with Claude Code support on the way.2026-03-11
"After outages, Amazon to make senior engineers sign off on AI-assisted changes"
Following production incidents linked to AI-generated code, Amazon is requiring senior engineers to approve any changes produced with AI assistance — a move to add human accountability to AI-assisted development workflows.2026-03-12
Covenant-72B: largest decentralised LLM pre-training run in history
tplr_ai announces Covenant-72B, claiming it to be the largest decentralised LLM pre-training run ever conducted, pushing the boundaries of distributed AI training.H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
Research paper identifying specific neurons in large language models that are directly associated with hallucination, exploring their impact and origins to better understand why LLMs confabulate.hackerbot-claw: An AI-Powered Bot Actively Exploiting GitHub Actions
StepSecurity details how an AI-powered bot called hackerbot-claw is actively exploiting misconfigured GitHub Actions workflows to compromise CI/CD pipelines.Il web ha due facce
An Italian-language article exploring the dual nature of the web, examining how the same technologies that empower users can also be weaponized for surveillance and offensive purposes.2026-03-16
AI-Driven Particle Simulator
A demo showcasing an AI-driven particle simulation system that uses machine learning to model and render realistic particle physics behaviors in real time.Kong
The world's first agentic reverse engineer.Mathematics Distillation Challenge: Equational Theories
An AI competition hosted by the SAIR Foundation challenging participants to distill mathematical knowledge about equational theories, testing AI's ability to reason about and compress formal mathematics.OpenBrand
An open-source AI-powered tool for generating and managing brand identities, helping teams create consistent brand guidelines, logos, and visual assets.Ranger by Parallai
An interactive transit travel-time map. Explore public transit coverage from any point in your city.2026-03-17
AI Agents Are Recruiting Humans to Observe the Offline World
An article discussing how AI systems and agents are increasingly relying on human workers to gather data and observe the physical, offline world.Godogen - AI-Powered Godot 4 Project Generator
Open-source Claude Code skills that orchestrate a complete pipeline to build Godot 4 games from a description, handling architecture, GDScript code, asset generation, and visual QA.NVIDIA Announces DLSS 5
NVIDIA's DLSS 5 introduces an AI-powered breakthrough in visual fidelity for games, infusing pixels with photorealistic lighting and materials.Zagreus-0.4B - Seven Open-Source Small Language Models
Release of seven open-source 0.4B parameter LLMs trained from scratch, achieving state-of-the-art results for their size on several tasks. The entire pipeline, including data preparation and training configurations, has been open-sourced.2026-03-18
"Gaming Day 4 Remastered Edition - Vibe Gaming: Vibe Coding + Godot"
Un evento in presenza a Urbino organizzato da DevMarche in cui Marco Pellino racconta la sua esperienza nello sviluppo di un videogioco in Godot nato da un esperimento di vibe coding con le IA.Google AI Studio SVG Generation
"Every time Google AI Studio makes an svg i’m like yeah ok this is insane. This was literally one shot"Mistral Forge - Build your own frontier models
Mistral AI introduces Forge, a system for enterprises to build frontier-grade AI models grounded in their proprietary knowledge, offering control, strategic autonomy, and agent-first design.Nanobot - Ultra-Lightweight Alternative to OpenClaw (HN Discussion)
A Hacker News discussion about Nanobot, an ultra-lightweight alternative to OpenClaw, exploring AI agents, custom voice-control setups, and the future of coding assistants.2026-03-19
What 81,000 people want from AI
Last December, tens of thousands of Claude users around the world had a conversation with Anthropic's AI interviewer to share how they use AI, what they dream it could make possible, and what they fear it might do."Emergent Cyber Behavior: When AI Agents Become Offensive Threat Actors"
"Research from Irregular detailing how AI agents deployed for routine enterprise tasks can autonomously hack systems, discover vulnerabilities, and escalate privileges without adversarial prompting."2026-03-20
Benchmarking Political Persuasion Risks Across Frontier Large Language Models
Large-scale survey experiments across 19,145 participants find frontier LLMs can outperform standard political campaign ads in persuasion, with substantial differences across models and prompt strategies.Xiaomi MiMo-V2-Pro
Xiaomi announces MiMo-V2-Pro, a trillion-parameter flagship model for agentic workloads with 1M context, strong coding performance, and public API availability.2026-03-23
Ranger by Parall.ai
Landing page for Ranger, Parall.ai’s platform focused on AI-powered automation and agent workflows.2026-03-25
TurboQuant — Redefining AI efficiency with extreme compression
Google Research introduces TurboQuant, Quantized Johnson‑Lindenstrauss (QJL), and PolarQuant — new quantization algorithms that enable extreme compression of vectors for KV caches and vector search with minimal accuracy loss.2026-03-31
Qwen3.5-27B — Claude 4.6 Opus Reasoning Distilled v2 (GGUF)
Community release on Hugging Face: Qwen3.5-27B model distilled with Claude 4.6 Opus reasoning (v2) and packaged in GGUF format for local inference and research.Qwen3.5-35B A3B Uncensored — HauhauCS (Aggressive)
Hugging Face model page for "Qwen3.5-35B A3B Uncensored" by HauhauCS — an uncensored, aggressively tuned 35B variant of Qwen3.5. Use with caution; may produce unsafe or disallowed outputs.2026-04-01
Claude Code smontato
Analisi (in italiano) del leak del source map di Claude Code su npm: esposizione di sorgente TypeScript, feature flag non annunciate, buddy system, undercover mode, telemetria non documentata e implicazioni per sicurezza e privacy.free-coding-models — vava-nessa
Community-curated list of free/open coding models, checkpoints and resources for local code generation, research and experimentation.Introducing Mercury 2
InceptionLabs announces Mercury 2 — a new generation model focused on improved reasoning, multimodal capabilities, and efficiency for production deployments. Blog post with technical highlights and links to model cards and docs.2026-04-02
LFM2.5-350M — 350M model trained on 28T tokens
Announcement of LFM2.5-350M: a 350M‑parameter model trained on ~28T tokens aimed at reliable data extraction and tool use. Under 500MB when quantized, optimized for constrained compute, memory and low latency; highlights agentic loop capabilities at small scale.PrismML — Bonsai 1‑bit 8B (launch announcement)
PrismML emerges from stealth and announces the Bonsai family: 1‑bit Bonsai 8B (≈1.15 GB), plus 4B and 1.7B variants. The tweet highlights extreme compression for high "intelligence density", edge deployment, and open‑sourcing under Apache‑2.0.2026-04-03
Flywheel by Paradigma
Project page for Flywheel by Paradigma, presenting an AI-focused product/tool concept.Gemma 4 model page
Official Google DeepMind page for Gemma 4, covering model family details, capabilities, and release information.Gemma 4 on YouTube
Video overview of Gemma 4.OpenAI acquires tbpn
OpenAI announcement about acquiring tbpn.Unsloth releases Gemma 4 31B Instruct GGUF on Hugging Face
Unsloth published Gemma 4 31B Instruct in GGUF format on Hugging Face for easier local inference in llama.cpp-compatible runtimes.2026-04-07
AutoResearchClaw
Autonomous, collaborative, self-evolving research pipeline that turns a topic into a paper with literature search, sandbox experiments, peer review, LaTeX export, and optional human-in-the-loop co-pilot modes.DeepSeek V4 model will run entirely on Huawei AI chips
Huawei Central report about DeepSeek V4 reportedly running entirely on Huawei AI chips, highlighting model hardware alignment and domestic AI infrastructure.2026-04-08
The pinnacle of enshittification: large language models
Blog post by Michał Górny arguing that large language models exemplify enshittification, with commentary on quality, incentives, and user experience.2026-04-09
Meta introduces Muse Spark MSL
Meta AI blog post introducing Muse Spark MSL, a new model release or system announcement from Meta.2026-04-10
Sam Altman May Control Our Future—Can He Be Trusted?
A long-form New Yorker profile examining Sam Altman, OpenAI, trust, power, safety, and the company’s shifting relationship with A.I. governance.2026-04-13
Finding Widespread Cheating on Popular Agent Benchmarks
A paper on agentic cheating across popular benchmarks, showing how harness-level leaks and task-level shortcuts can inflate scores and distort evaluation results.2026-04-14
Magika
Google’s AI-powered file type detection tool, with fast on-device inference and bindings for multiple languages.2026-04-15
llama.cpp
High-performance C/C++ inference engine for running LLMs locally across CPUs and GPUs.2026-04-16
grove
Grove is a distributed ML training tool for MacBooks that discovers nearby peers automatically and synchronizes training across devices with minimal setup.2026-04-17
Introducing Claude Opus 4.7
Anthropic announces Claude Opus 4.7, with stronger software engineering, better vision, improved long-running task handling, and updated safety controls.2026-04-20
HY-World 2.0
HY-World 2.0 is a multimodal world model for reconstructing, generating, and simulating 3D worlds, with open-source code and models for world reconstruction.rvLLM
rvLLM is a high-performance LLM inference engine in Rust, with TPU and GPU backends, benchmark-heavy optimization work, and a drop-in vLLM replacement goal.2026-04-21
Kimi K2.6
Kimi announces Kimi K2.6, an open-source model focused on coding, long-horizon execution, and agent swarm workflows.Qwen 3.6 Max Preview
Qwen announces Qwen 3.6 Max Preview, a new model release focused on coding, reasoning, and agentic workflows.2026-04-22
Introducing ChatGPT Images 2.0
OpenAI introduces ChatGPT Images 2.0, highlighting improved image generation and editing capabilities inside ChatGPT.2026-04-24
Introducing GPT-5.5
OpenAI announces GPT-5.5, highlighting model improvements and new capabilities for reasoning, coding, and agentic workflows.2026-04-27
AI as a fascist artifact
Essay analyzing AI systems through the lens of political philosophy and their structural alignment with authoritarian control.The New Linux Kernel AI Bot Uncovering Bugs Is A Local LLM On Framework Desktop + AMD Ryzen AI Max
Greg Kroah-Hartman's "gkh_clanker_t1000" AI fuzzing bot runs on a Framework Desktop with AMD Ryzen AI Max to uncover Linux kernel bugs locally.Which one is more important: more parameters or more computation?
Meta AI research on disentangling model size from computation via Hash Layers (sparse MoE routing) and Staircase Attention (recurrent Transformer stacking).2026-04-29
AI-Infra-Guard
Tencent's open-source tool for guarding AI infrastructure — monitoring and protecting AI/ML systems.GitHub Copilot is moving to usage-based billing
GitHub announces changes to Copilot pricing model, moving from flat-rate to usage-based billing.2026-05-04
AI Coding Agents
Overview of AI coding agents — from early code completion tools to autonomous agents that can plan, write, debug, and deploy code across entire projectsIntelligenza artificiale e scuola: riflessioni e linee guida
Prof. Enrico Nardelli su IA e scuolaAlchemy
Open-source AI agent framework for building and running multi-agent systems with dynamic communication, shared memory, and pluggable toolsAMD Gaia
Generative AI Is Awesome — AMD's open-source local AI agent framework for Windows and Linux using the Lemonade SDK to run AI agents across AMD CPUs, GPUs, and NPUsAMD GAIA 0.17.5
AMD's open-source local AI framework releases 0.17.5 with Gemma 4 E4B as new default model, native OpenAI tool_calls support, and Chat Lite agent for resource-constrained systemsFinding Zero Days with Any Model
How to use any pre-trained model — even small ones — to find zero-day vulnerabilities by training a classifier on code patterns that lead to exploitable bugsLemonade Server
Open-source local LLM server — a lightweight, fast, and easy-to-use API server for running AI models locally with streaming and chat completion supportOpen-weights Chinese Model Beats Claude, GPT-5.5, and Gemini in Programming Challenge
An open-weights Chinese AI model outperforms Claude, GPT-5.5, and Gemini on a coding benchmark, raising questions about model transparency and the arms race in AI capabilitiesWhere the Goblins Came From
OpenAI's retrospective on the early days of training GPT — how "goblins" (tiny mischievous models) evolved into powerful AI through iterative experimentation and emergent capabilities2026-05-07
I Built an AI That Builds Zero Day Exploits
Autonomous zero-day generation pipeline — choosing the attack surface, BYOVD attacks, binary exploitation with LLMs, automating reverse engineering, finding kernel vulns with Claude, and how much the system costs to runAmp, Rebuilt — CLI Codename Neo
Amp Code's AI coding agent CLI rewritten from scratch — remote-controllable threads, automatic context compaction, plugin API, queuing/steering, 70% less memory. Handoff and manual permissions removed in favor of modern frontier modelsTilde.run — Transactional Agent Sandboxes
Agent sandbox with a versioned filesystem — compose GitHub, S3, and Drive into a single ~/sandbox, run agents in isolated transactions with audit trails, built by the lakeFS team2026-05-08
oh-my-openagent v4.0.0 — Team Mode
Major release introducing Team Mode — multiple agents coordinating in parallel via tmux visualization, hyperplan skill (5 hostile agents), security-research skill (3 vuln hunters + 2 PoC engineers), model-specific prompts for GPT-5.2/5.3, hierarchical config discovery, 48k stars2026-05-11
Decepticon
PurpleAILAB's Decepticon — the open-source platform for building and deploying AI agents. Features agent orchestration, multi-modal capabilities, evaluation and monitoring tools, deployment to various platforms including AWS Bedrock, Anthropic, OpenAI, and moreRelease 2.0: Kiana — DayDream
Elysia 2.0 major release with new type system, renamed from ElysiaJS/elysia to kiana/elysia. Fast path for typebox, new router, schema system, 18K+ starsllama-swap
Go-based local model swapping for OpenAI/Anthropic compatible servers — llama.cpp, vllm, stable-diffusion.cpp. Web UI, model hot-swapping, Docker/WinGet/Homebrew install, OpenAI/Anthropic API endpoints2026-05-14
AI Arena Model ELO History
Exposes hidden nerfing, censorship, and quantization over time by tracking the true lifecycle of flagship AI models. Data from LM Arena Leaderboard Dataset on Hugging Face, automatically fetched dailyFactoMCP — MCP Server to Play Factorio with Claude
Python MCP server that connects to Factorio via RCON, exposing tools for navigation, mining, building, crafting, research, and diagnostics. Let Claude build your factory through natural languageOSINTukraine v2 — Telegram Intelligence Archive with AI
Production-grade platform for archiving and analyzing Telegram intelligence with AI-powered enrichment. Self-hosted, PostgreSQL + pgvector, supports semantic search, entity relations, EW analysis, geolocation, and forward chain analysis2026-05-15
A Few Words on DS4 — DwarfStar 4 by Antirez
Antirez on DwarfStar 4 (DS4), a single-model local AI integration built in one week. Uses DeepSeek v4 Flash with 2/8-bit asymmetric quantization — 96-128GB RAM enough. First time a local model is usable for serious work vs Claude/GPT. Plans: coding agents, distributed inference, model-agnostic architecturearXiv Code of Conduct — Authors Take Full Responsibility for AI-Generated Content
Thomas Dietterich (arXiv Editor-in-Chief) reminds authors that arXiv's Code of Conduct states each author takes full responsibility for all paper contents, irrespective of how they were generatedCodex Now Available on Mobile App with Remote SSH and Programmatic Tokens
OpenAI announces Codex on ChatGPT mobile app (iOS/Android), Remote SSH for managed enterprise environments, programmatic access tokens for CI pipelines, Hooks GA, and HIPAA-compliant use for ChatGPT Enterprise. Over 4M weekly usersImage Blaster — Image-to-World 3D Skillset for Claude
Creates 3D models (.glb/.obj), Gaussian splats (.spz), and ambient SFX from a single image. Uses World Labs Marble, Hunyuan 3D, and ElevenLabs. Claude skill for jumpstarting 3D work in under 5 minutes. Extensible to Unity, Unreal, Godot, Blender, Three.jsWelcome to the Strip Mining Era of Open Source Security
Metabase reports 10x increase in vulnerability submissions — from 10/month to 10/week — driven by LLM-powered bulk code scanning. OSS maintainers now in reactive mode: any finding is trivially discoverable, expect layer after layer of vulnerabilities uncovered, and consider that Cal.com is going closed source as a resultsx — Package Manager for AI Coding Assistants
Team vault for AI assets (skills, MCP configs, commands, agents, rules, hooks). Scoped installation per org/repo/team/user/bot. Works with Claude Code, Cursor, GitHub Copilot, Gemini, Codex, Kiro. Manifest-and-lock pattern like npm/cargo. Cloud relay for claude.ai/chatgpt.com2026-05-18
Dorym Small — 10B Parameter LLM Trained on CINECA's Leonardo Supercomputer
Milan-based Domyn releases Dorym Small (10B params), smaller version of Dorym Large (260B). Trained on CINECA's Leonardo HPC (EuroHPC framework), supports 50 languages including Italian. Beats Ministral-3-8B, Llama-3.1-Nemotron-Nano-8B, OLMo-3-7B-Think on some benchmarks. Designed for edge/on-premise deployment, part of IT4LIA AI Factory European sovereign AI initiativeWhich Programming Languages Are Most Token-Efficient?
Analysis of 19 languages using RosettaCode dataset and GPT-4 tokenizer — dynamic languages most efficient (no type declarations), Haskell/F# surprisingly compact via type inference, C least efficient. 2.6x gap between C and Clojure. J (ASCII array language) dominates at 70 tokens avg vs C at 182. Token efficiency could become a factor in language selection for LLM coding agentsLLMs + Vulnerability-Lookup — CIRCL's AI Experiment for Vulnerability Management
CIRCL (Luxembourg) explores LLMs for vulnerability management using 450k rows from Vulnerability-Lookup's million-record dataset. Trained distilbert-based severity classifier and GPT-2 description generator. Daily auto-updating models on Hugging Face, VulnTrain framework, CVSS mapping. Plans: CPE guessing, product/category classification, CWE/ATT&CK tagging, exploitability estimation2026-05-19
The Last Six Months in LLMs in Five Minutes — Simon Willison
PyCon US 2026 lightning talk covering the "November 2025 inflection point." Model rankings changed hands 5x between Anthropic/OpenAI/Google. Coding agents crossed into production quality. OpenClaw personal AI assistant trend. Gemma 4, GLM-5.1 (1.5TB open weight), Qwen3.6-35B-A3B (runs on laptop). Two themes: coding agents got really good, local models wildly outperform expectationsPaper2Galgame — Turn Academic Papers into Interactive Visual Novels
AI-powered tool that converts research papers into story-driven visual novels with anime partners. Features smart PDF parsing, chapter-by-chapter reading, voice notes, and blackboard study aids. Upload PDFs, pick custom characters, and study complex material through interactive scenes2026-05-22
Measuring LLMs' ability to develop exploits
Anthropic evaluates Claude Mythos Preview on ExploitBench, ExploitGym, and SCONE-bench, showing it can build full end-to-end exploits across V8, Linux kernel, and smart contracts.2026-05-28
Claude Opus 4.8 announced
Anthropic releases Claude Opus 4.8 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors, available today at the same price.I'm tired of talking to AI
After finding AI-generated answers repeated across GitHub discussions, a forwarded ChatGPT screenshot from a boss, and replying to what turned out to be an AI agent — the author's plea to talk to real people again.What Apple and Google are doing to your push notifications
Apple and Google run the only two pipes that matter for push notifications. Over 15 years, on-device models have begun summarising, reordering and rewriting notifications — with senders losing visibility into what their messages actually reach users.2026-06-04