CHAI, UC Berkeley
Research Intern
Apr 2025 - Aug 2025 · Berkeley / remote
- Worked with Michael Cohen in Stuart Russell's group (CHAI) on preventing reward hacking during RLHF post-training of large language models.
- Designed a co-training scheme for a reward model and policy LLM to encourage pessimistic reward estimates on model-generated outputs.
- Built the full research pipeline from proposal to implementation, including reward model and LLM training; the project continues as part of my PhD research.