lcanello

Personal site of lcanello

← home

tag: ai

nCPU

A neural network implemented as a CPU architecture — neurons as registers, synapses as instructions.

Agents of Chaos

Exploratory red-teaming study of autonomous language-model-powered agents in a live lab environment, documenting failures like unauthorized actions, sensitive data disclosure, destructive behavior, spoofing, and partial system takeover.

BullshitBench

Benchmark measuring how well LLMs detect nonsense and push back on bullshit questions.

PulseMCP

A hub for exploring the Model Context Protocol ecosystem — servers, clients, use cases, tools, and a weekly newsletter covering what's new in MCP.

autoresearch

Karpathy's experiment giving an AI agent a single-GPU LLM training setup and letting it run autonomous overnight research — it modifies code, trains for 5 minutes, checks if the result improved, and repeats.

Google Workspace CLI (gws)

One CLI for all of Google Workspace — Drive, Gmail, Calendar, Sheets, and more. Dynamically built from Google's own Discovery Service at runtime, with structured JSON output and 100+ bundled AI agent skills.

T3 Code

Minimal web GUI and desktop app for coding agents — currently Codex-first, with Claude Code support on the way.

OpenBrand

An open-source AI-powered tool for generating and managing brand identities, helping teams create consistent brand guidelines, logos, and visual assets.

Ranger by Parallai

An interactive transit travel-time map. Explore public transit coverage from any point in your city.

NVIDIA Announces DLSS 5

NVIDIA's DLSS 5 introduces an AI-powered breakthrough in visual fidelity for games, infusing pixels with photorealistic lighting and materials.

What 81,000 people want from AI

Last December, tens of thousands of Claude users around the world had a conversation with Anthropic's AI interviewer to share how they use AI, what they dream it could make possible, and what they fear it might do.

Xiaomi MiMo-V2-Pro

Xiaomi announces MiMo-V2-Pro, a trillion-parameter flagship model for agentic workloads with 1M context, strong coding performance, and public API availability.

Introducing Mercury 2

InceptionLabs announces Mercury 2 — a new generation model focused on improved reasoning, multimodal capabilities, and efficiency for production deployments. Blog post with technical highlights and links to model cards and docs.

AutoResearchClaw

Autonomous, collaborative, self-evolving research pipeline that turns a topic into a paper with literature search, sandbox experiments, peer review, LaTeX export, and optional human-in-the-loop co-pilot modes.

Magika

Google’s AI-powered file type detection tool, with fast on-device inference and bindings for multiple languages.

grove

Grove is a distributed ML training tool for MacBooks that discovers nearby peers automatically and synchronizes training across devices with minimal setup.

HY-World 2.0

HY-World 2.0 is a multimodal world model for reconstructing, generating, and simulating 3D worlds, with open-source code and models for world reconstruction.

rvLLM

rvLLM is a high-performance LLM inference engine in Rust, with TPU and GPU backends, benchmark-heavy optimization work, and a drop-in vLLM replacement goal.

Kimi K2.6

Kimi announces Kimi K2.6, an open-source model focused on coding, long-horizon execution, and agent swarm workflows.

AI Coding Agents

Overview of AI coding agents — from early code completion tools to autonomous agents that can plan, write, debug, and deploy code across entire projects

Alchemy

Open-source AI agent framework for building and running multi-agent systems with dynamic communication, shared memory, and pluggable tools

AMD Gaia

Generative AI Is Awesome — AMD's open-source local AI agent framework for Windows and Linux using the Lemonade SDK to run AI agents across AMD CPUs, GPUs, and NPUs

AMD GAIA 0.17.5

AMD's open-source local AI framework releases 0.17.5 with Gemma 4 E4B as new default model, native OpenAI tool_calls support, and Chat Lite agent for resource-constrained systems

Lemonade Server

Open-source local LLM server — a lightweight, fast, and easy-to-use API server for running AI models locally with streaming and chat completion support

Where the Goblins Came From

OpenAI's retrospective on the early days of training GPT — how "goblins" (tiny mischievous models) evolved into powerful AI through iterative experimentation and emergent capabilities

oh-my-openagent v4.0.0 — Team Mode

Major release introducing Team Mode — multiple agents coordinating in parallel via tmux visualization, hyperplan skill (5 hostile agents), security-research skill (3 vuln hunters + 2 PoC engineers), model-specific prompts for GPT-5.2/5.3, hierarchical config discovery, 48k stars

Decepticon

PurpleAILAB's Decepticon — the open-source platform for building and deploying AI agents. Features agent orchestration, multi-modal capabilities, evaluation and monitoring tools, deployment to various platforms including AWS Bedrock, Anthropic, OpenAI, and more

llama-swap

Go-based local model swapping for OpenAI/Anthropic compatible servers — llama.cpp, vllm, stable-diffusion.cpp. Web UI, model hot-swapping, Docker/WinGet/Homebrew install, OpenAI/Anthropic API endpoints

AI Arena Model ELO History

Exposes hidden nerfing, censorship, and quantization over time by tracking the true lifecycle of flagship AI models. Data from LM Arena Leaderboard Dataset on Hugging Face, automatically fetched daily

A Few Words on DS4 — DwarfStar 4 by Antirez

Antirez on DwarfStar 4 (DS4), a single-model local AI integration built in one week. Uses DeepSeek v4 Flash with 2/8-bit asymmetric quantization — 96-128GB RAM enough. First time a local model is usable for serious work vs Claude/GPT. Plans: coding agents, distributed inference, model-agnostic architecture

Welcome to the Strip Mining Era of Open Source Security

Metabase reports 10x increase in vulnerability submissions — from 10/month to 10/week — driven by LLM-powered bulk code scanning. OSS maintainers now in reactive mode: any finding is trivially discoverable, expect layer after layer of vulnerabilities uncovered, and consider that Cal.com is going closed source as a result

sx — Package Manager for AI Coding Assistants

Team vault for AI assets (skills, MCP configs, commands, agents, rules, hooks). Scoped installation per org/repo/team/user/bot. Works with Claude Code, Cursor, GitHub Copilot, Gemini, Codex, Kiro. Manifest-and-lock pattern like npm/cargo. Cloud relay for claude.ai/chatgpt.com

Dorym Small — 10B Parameter LLM Trained on CINECA's Leonardo Supercomputer

Milan-based Domyn releases Dorym Small (10B params), smaller version of Dorym Large (260B). Trained on CINECA's Leonardo HPC (EuroHPC framework), supports 50 languages including Italian. Beats Ministral-3-8B, Llama-3.1-Nemotron-Nano-8B, OLMo-3-7B-Think on some benchmarks. Designed for edge/on-premise deployment, part of IT4LIA AI Factory European sovereign AI initiative

Which Programming Languages Are Most Token-Efficient?

Analysis of 19 languages using RosettaCode dataset and GPT-4 tokenizer — dynamic languages most efficient (no type declarations), Haskell/F# surprisingly compact via type inference, C least efficient. 2.6x gap between C and Clojure. J (ASCII array language) dominates at 70 tokens avg vs C at 182. Token efficiency could become a factor in language selection for LLM coding agents

LLMs + Vulnerability-Lookup — CIRCL's AI Experiment for Vulnerability Management

CIRCL (Luxembourg) explores LLMs for vulnerability management using 450k rows from Vulnerability-Lookup's million-record dataset. Trained distilbert-based severity classifier and GPT-2 description generator. Daily auto-updating models on Hugging Face, VulnTrain framework, CVSS mapping. Plans: CPE guessing, product/category classification, CWE/ATT&CK tagging, exploitability estimation

The Last Six Months in LLMs in Five Minutes — Simon Willison

PyCon US 2026 lightning talk covering the "November 2025 inflection point." Model rankings changed hands 5x between Anthropic/OpenAI/Google. Coding agents crossed into production quality. OpenClaw personal AI assistant trend. Gemma 4, GLM-5.1 (1.5TB open weight), Qwen3.6-35B-A3B (runs on laptop). Two themes: coding agents got really good, local models wildly outperform expectations

Claude Opus 4.8 announced

Anthropic releases Claude Opus 4.8 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors, available today at the same price.

I'm tired of talking to AI

After finding AI-generated answers repeated across GitHub discussions, a forwarded ChatGPT screenshot from a boss, and replying to what turned out to be an AI agent — the author's plea to talk to real people again.

AI Is a Mirror of Our Engineering Culture

A blog post arguing that AI didn't create the software quality crisis — it held up a mirror. Trained on 518M GitHub repos (mostly mediocre), AI reproduces the most probable patterns: technical debt, copy-paste, vague specs. AI-generated code entering codebases triggers recursive data collapse.

Kimiko

Configuration repository that transforms Kimi Code CLI into an unrestricted agent for offensive security, red-teaming, and penetration testing — removes AI safety guardrails via a zero-blocker authorization flow.

Magnifica Humanitas

Encyclical letter by Pope Leo XIV (May 15, 2026) on safeguarding human dignity in the age of AI — draws on the Tower of Babel and Nehemiah's walls as two visions of technological civilization, warns against the "Babel syndrome" of profit idolatry and digital uniformity, calls for shared responsibility and the "way of Nehemiah."

Odysseus

Self-hosted AI workspace — a ChatGPT/Claude-like UI running on your own hardware with chat, agents (MCP, web, shell, skills, memory), deep research, model comparison, documents, email, calendar, and cookbook for local model serving. 46k stars.

You can just say it

A blog post arguing that humans are valuable without qualifying it by their output quality or the narrowing AI capability gap — "creation is the distillation of intent into form," and AI too easily allows substantial form without discernible intent.