Avalisa Zhang
Independent Researcher / ElmWater AI
AI researcher focused on reinforcement learning, policy optimization, LLM post-training, and evaluation.
Research Focus
Policy Optimization RL Post-Training LLM Evaluation Multi-Agent SystemsCurrent Interests
Large Language Models Multimodality World Models Large-Scale LearningI work across theory, controlled empirical study, and production-scale learning systems, with a focus on reinforcement learning algorithms, large model experimentation, and multi-agent deployment.
My current research studies GRPO-style operators and the bias-variance tradeoffs introduced by filtering, token weighting, and advantage normalization in nonconvex policy optimization.
Previously, I built production multi-agent LLM systems for document analysis and task processing, fine tuned large language models on private domain data, and co-founded NearMeNow, a distributed real-time location platform.
Selected Research
View All →A Unified Bias Variance Analysis of GRPO Style Operators
Independent AI Researcher, 2025 - Present
Manuscript in review, target submission: ICML Workshop 2026.
- Develops a unified theoretical framework for GRPO-style operators in policy optimization.
- Characterizes how filtering, token weighting, and advantage normalization alter nonconvex policy optimization through explicit bias and covariance channels.
- Frames practical post-training heuristics as bias-variance tradeoffs linked to predictable optimization behavior.
Research and Engineering
Research Collaborator, ElmWater AI Lab
Jun 2025 - Present
- Conducted RL post-training on 32B to 70B language models in simulated decision environments.
- Designed a reproducible LLM evaluation system across 12 benchmarks, experiment tracking, and systematic comparisons across training and inference settings.
- Ran 100+ controlled experiments comparing 5 policy optimization methods across 8 core tasks and multiple random seeds for finance-domain problems.
- Translated recent RL and LLM training papers into ablation plans and empirical tests to separate robust gains from configuration-sensitive improvements.
AI Engineer, Travelers Insurance Company
Hartford, CT · Jul 2023 - Mar 2025
- Architected a distributed multi-agent LLM system for document analysis and task processing, scaling to 1,000+ complex cases daily with 85% agreement with expert decisions.
- Fine tuned large language models on private domain data, improving task-specific performance and enabling production-grade internal use.
Co-Founder and Lead Software Engineer, NearMeNow
Washington, DC · Jan 2022 - May 2023
- Co-founded a consumer technology startup building a distributed real-time location platform for discovering nearby activities and events.
Technical Skills
Languages
Python, C++, Java
Tools
PyTorch, JAX, XLA, HuggingFace Transformers, TRL, NumPy, experiment tracking tools