Home
What I've Been Working On
Blog
Notes
Recommendations
Contact
Colophon

Notes

RSS Subscription

Jan 2025

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

03 Jan 2025 · 2 min read

1
2
3

Archive

2026 16
Apr 2026 6
Jump
Combining Cost-Constrained Runtime Monitors for AI Safety
Formal Framing of AI Control
Ctrl-Z - Controlling AI Agents via Resampling
MALT: A Dataset of Natural and Prompted Behaviors That Threaten Eval Integrity
Models of Safety Evaluations of AI Deployment Protocols
Evaluating Language-Model Agents on Realistic Autonomous Tasks
Mar 2026 5
Jump
AI Control
DOPE Algorithm
Early work on monitorability evaluations
Dreamcoder
Provable Safe Reinforcement Learning with Binary Feedback
Feb 2026 1
Jump
Safe Exploration in Reinforcement Learning
Jan 2026 4
Jump
Rethinking Lipschitz Neural Networks
DoomArena Notes
Certified Adversarial Robustness via Randomized Smoothing
Towards a scale-free theory of intelligent agency
2025 5
Dec 2025 4
Jump
Decision Transformer Paper Summary
RL Paper Summaries
Notes on Mountain Ranges
Review of Elementary Topology
Jan 2025 1
Jump
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

Navigation

Esc to close