tag: research-paper

2026-03-08

Agents of Chaos

Exploratory red-teaming study of autonomous language-model-powered agents in a live lab environment, documenting failures like unauthorized actions, sensitive data disclosure, destructive behavior, spoofing, and partial system takeover.

ai security offensive-security research-paper

2026-03-10

autoresearch

Karpathy's experiment giving an AI agent a single-GPU LLM training setup and letting it run autonomous overnight research — it modifies code, trains for 5 minutes, checks if the result improved, and repeats.

ai open-source research-paper

"SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration"

A new repository-level benchmark built around the Continuous Integration loop. Instead of static one-shot bug fixes (à la SWE-bench), SWE-CI evaluates whether AI agents can sustain long-term code quality through 100 real-world tasks spanning an average of 233 days and 71 consecutive commits each.

ai software-engineering research-paper

2026-03-12

H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs

Research paper identifying specific neurons in large language models that are directly associated with hallucination, exploring their impact and origins to better understand why LLMs confabulate.

ai research-paper

2026-03-17

Texel Splatting - Perspective-Stable 3D Pixel Art

An open-source paper and code introducing a perspective-stable 3D pixel art technique that solves screen grid snapping for perspective cameras.

open-source research-paper

2026-03-19

"Emergent Cyber Behavior: When AI Agents Become Offensive Threat Actors"

"Research from Irregular detailing how AI agents deployed for routine enterprise tasks can autonomously hack systems, discover vulnerabilities, and escalate privileges without adversarial prompting."

ai security offensive-security research-paper

2026-03-20

Benchmarking Political Persuasion Risks Across Frontier Large Language Models

Large-scale survey experiments across 19,145 participants find frontier LLMs can outperform standard political campaign ads in persuasion, with substantial differences across models and prompt strategies.

ai research-paper

2026-03-25

TurboQuant — Redefining AI efficiency with extreme compression

Google Research introduces TurboQuant, Quantized Johnson‑Lindenstrauss (QJL), and PolarQuant — new quantization algorithms that enable extreme compression of vectors for KV caches and vector search with minimal accuracy loss.

ai research-paper

2026-03-31

Qwen3.5-27B — Claude 4.6 Opus Reasoning Distilled v2 (GGUF)

Community release on Hugging Face: Qwen3.5-27B model distilled with Claude 4.6 Opus reasoning (v2) and packaged in GGUF format for local inference and research.

ai models research-paper

2026-04-01

Redis — HyperLogLog (antirez)

antirez's classic post introducing the HyperLogLog data structure in Redis: algorithm overview, implementation notes, API (PFADD / PFCOUNT / PFMERGE), and performance/precision tradeoffs.

systems open-source research-paper

RF Studio — Arena Physica publication

RF Studio — publication and project page from Arena Physica describing RF Studio, a toolkit and research effort for radio‑frequency experimentation, measurement workflows and reproducible RF system design.

systems research-paper

Safeguarding cryptocurrency by disclosing quantum vulnerabilities responsibly

Google Research outlines responsible disclosure practices and mitigation strategies for quantum‑vulnerabilities affecting cryptocurrency systems, with recommendations for coordinated disclosure, defensive upgrades, and community preparedness.

security blockchain research-paper

2026-04-02

LFM2.5-350M — 350M model trained on 28T tokens

Announcement of LFM2.5-350M: a 350M‑parameter model trained on ~28T tokens aimed at reliable data extraction and tool use. Under 500MB when quantized, optimized for constrained compute, memory and low latency; highlights agentic loop capabilities at small scale.

ai models research-paper

2026-04-03

NIST SRM 4351 Certificate (PDF)

Official NIST certificate PDF for Standard Reference Material (SRM) 4351.

public-admin research-paper

2026-04-23

Driving into the Unknown: Investigating and Addressing Security Breaches in Vehicle Infotainment Systems

Research paper analyzing security vulnerabilities and breach patterns in modern vehicle infotainment systems.

research-paper security systems

2026-04-27

Which one is more important: more parameters or more computation?

Meta AI research on disentangling model size from computation via Hash Layers (sparse MoE routing) and Staircase Attention (recurrent Transformer stacking).

ai research-paper blog

2026-05-07

The Art of Finding Cyber-Dinosaur Skeletons

Kaspersky GReAT explains APT research methodology — comparing threat hunting to paleontology, using the Regin operation as a case study. Why it took 2 years to publish, collecting fragments, and reconstructing the full monster

security research-paper threat-intelligence

2026-05-22

Measuring LLMs' ability to develop exploits

Anthropic evaluates Claude Mythos Preview on ExploitBench, ExploitGym, and SCONE-bench, showing it can build full end-to-end exploits across V8, Linux kernel, and smart contracts.

ai security research-paper blog

2026-06-04

A blueprint for formal verification of Apple corecrypto

Apple Security Engineering publishes their formal verification approach for ML-KEM and ML-DSA in corecrypto — combining Isabelle, SAW, and Cryptol to prove functional correctness of C and ARM64 assembly implementations against FIPS 203/204 specifications, with 50,000+ proof steps.

security systems research-paper