• Home
  • What I've Been Working On
  • Blog
  • Notes
  • Recommendations
  • Contact
  • Colophon

Notes

RSS Subscription

Apr 2026

Combining Cost-Constrained Runtime Monitors for AI Safety

26 Apr 2026 · 2 min read

Formal Framing of AI Control

24 Apr 2026 · 3 min read

Ctrl-Z - Controlling AI Agents via Resampling

15 Apr 2026 · 1 min read

MALT: A Dataset of Natural and Prompted Behaviors That Threaten Eval Integrity

12 Apr 2026 · 2 min read

Models of Safety Evaluations of AI Deployment Protocols

12 Apr 2026 · 3 min read

Evaluating Language-Model Agents on Realistic Autonomous Tasks

10 Apr 2026 · 2 min read

Mar 2026

AI Control

24 Mar 2026 · 3 min read

DOPE Algorithm

14 Mar 2026 · 3 min read

Early work on monitorability evaluations

13 Mar 2026 · 1 min read

Dreamcoder

05 Mar 2026 · 8 min read
  • 1
  • 2
  • 3
Older →

Archive

  • 2026 16
    Apr 2026 6
    Jump
    • Combining Cost-Constrained Runtime Monitors for AI Safety
    • Formal Framing of AI Control
    • Ctrl-Z - Controlling AI Agents via Resampling
    • MALT: A Dataset of Natural and Prompted Behaviors That Threaten Eval Integrity
    • Models of Safety Evaluations of AI Deployment Protocols
    • Evaluating Language-Model Agents on Realistic Autonomous Tasks
    Mar 2026 5
    Jump
    • AI Control
    • DOPE Algorithm
    • Early work on monitorability evaluations
    • Dreamcoder
    • Provable Safe Reinforcement Learning with Binary Feedback
    Feb 2026 1
    Jump
    • Safe Exploration in Reinforcement Learning
    Jan 2026 4
    Jump
    • Rethinking Lipschitz Neural Networks
    • DoomArena Notes
    • Certified Adversarial Robustness via Randomized Smoothing
    • Towards a scale-free theory of intelligent agency
  • 2025 5
    Dec 2025 4
    Jump
    • Decision Transformer Paper Summary
    • RL Paper Summaries
    • Notes on Mountain Ranges
    • Review of Elementary Topology
    Jan 2025 1
    Jump
    • Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

Navigation

Esc to close