Research and Projects
A Unified Bias Variance Analysis of GRPO Style Operators
Manuscript in review, target submission: ICML Workshop 2026
Develops a unified theoretical framework for GRPO-style operators in policy optimization and connects practical post-training heuristics to bias-variance tradeoffs.
LLM Evaluation and Policy Optimization Experiments
ElmWater AI Lab, 2025 - Present
Built reproducible evaluation infrastructure across 12 benchmarks and ran 100+ controlled experiments comparing 5 policy optimization methods across 8 core tasks.
Comparisons of RL Algorithms in a Nonstationary Bandit Setting
Empirical comparisons of reinforcement learning algorithms under nonstationary reward dynamics.