Research and Projects

A Unified Bias Variance Analysis of GRPO Style Operators

Manuscript in review, target submission: ICML Workshop 2026

Develops a unified theoretical framework for GRPO-style operators in policy optimization and connects practical post-training heuristics to bias-variance tradeoffs.

Policy Optimization GRPO RL Post-Training

LLM Evaluation and Policy Optimization Experiments

ElmWater AI Lab, 2025 - Present

Built reproducible evaluation infrastructure across 12 benchmarks and ran 100+ controlled experiments comparing 5 policy optimization methods across 8 core tasks.

LLM Evaluation Experiment Tracking Finance Domains

Comparisons of RL Algorithms in a Nonstationary Bandit Setting

Empirical comparisons of reinforcement learning algorithms under nonstationary reward dynamics.

JAX Reinforcement Learning