• Home
  • What I've Been Working On
  • Blog
  • Notes
  • Recommendations
  • Contact
  • Colophon

Notes

RSS Subscription

Jan 2025

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

03 Jan 2025 · 2 min read
← Newer
  • 1
  • 2
  • 3

Archive

  • 2026 16
    Apr 2026 6
    Jump
    • Combining Cost-Constrained Runtime Monitors for AI Safety
    • Formal Framing of AI Control
    • Ctrl-Z - Controlling AI Agents via Resampling
    • MALT: A Dataset of Natural and Prompted Behaviors That Threaten Eval Integrity
    • Models of Safety Evaluations of AI Deployment Protocols
    • Evaluating Language-Model Agents on Realistic Autonomous Tasks
    Mar 2026 5
    Jump
    • AI Control
    • DOPE Algorithm
    • Early work on monitorability evaluations
    • Dreamcoder
    • Provable Safe Reinforcement Learning with Binary Feedback
    Feb 2026 1
    Jump
    • Safe Exploration in Reinforcement Learning
    Jan 2026 4
    Jump
    • Rethinking Lipschitz Neural Networks
    • DoomArena Notes
    • Certified Adversarial Robustness via Randomized Smoothing
    • Towards a scale-free theory of intelligent agency
  • 2025 5
    Dec 2025 4
    Jump
    • Decision Transformer Paper Summary
    • RL Paper Summaries
    • Notes on Mountain Ranges
    • Review of Elementary Topology
    Jan 2025 1
    Jump
    • Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

Navigation

Esc to close