• Home
  • What I've Been Working On
  • Blog
  • Notes
  • Recommendations
  • Contact
  • Colophon

Notes

RSS Subscription

Apr 2026

Models of Safety Evaluations of AI Deployment Protocols

12 Apr 2026 · 3 min read

Evaluating Language-Model Agents on Realistic Autonomous Tasks

10 Apr 2026 · 2 min read

Mar 2026

AI Control

24 Mar 2026 · 3 min read

DOPE Algorithm

14 Mar 2026 · 3 min read

Early work on monitorability evaluations

13 Mar 2026 · 1 min read

Dreamcoder

05 Mar 2026 · 8 min read

Provable Safe Reinforcement Learning with Binary Feedback

02 Mar 2026 · 4 min read

Feb 2026

Safe Exploration in Reinforcement Learning

17 Feb 2026 · 2 min read

Jan 2026

Rethinking Lipschitz Neural Networks

16 Jan 2026 · 1 min read

DoomArena Notes

14 Jan 2026 · 2 min read
  • 1
  • 2
Older →

Archive

  • 2026 12
    Apr 2026 2
    Jump
    • Models of Safety Evaluations of AI Deployment Protocols
    • Evaluating Language-Model Agents on Realistic Autonomous Tasks
    Mar 2026 5
    Jump
    • AI Control
    • DOPE Algorithm
    • Early work on monitorability evaluations
    • Dreamcoder
    • Provable Safe Reinforcement Learning with Binary Feedback
    Feb 2026 1
    Jump
    • Safe Exploration in Reinforcement Learning
    Jan 2026 4
    Jump
    • Rethinking Lipschitz Neural Networks
    • DoomArena Notes
    • Certified Adversarial Robustness via Randomized Smoothing
    • Towards a scale-free theory of intelligent agency
  • 2025 5
    Dec 2025 4
    Jump
    • Decision Transformer Paper Summary
    • RL Paper Summaries
    • Notes on Mountain Ranges
    • Review of Elementary Topology
    Jan 2025 1
    Jump
    • Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

Navigation

Esc to close