Avalisa Zhang

Independent Researcher / ElmWater AI

AI researcher focused on reinforcement learning, policy optimization, LLM post-training, and evaluation.

avalisa@elmwaterai.com San Francisco, CA

Research Focus

Policy Optimization RL Post-Training LLM Evaluation Multi-Agent Systems

Current Interests

Large Language Models Multimodality World Models Large-Scale Learning

I work across theory, controlled empirical study, and production-scale learning systems, with a focus on reinforcement learning algorithms, large model experimentation, and multi-agent deployment.

My current research studies GRPO-style operators and the bias-variance tradeoffs introduced by filtering, token weighting, and advantage normalization in nonconvex policy optimization.

Previously, I built production multi-agent LLM systems for document analysis and task processing, fine tuned large language models on private domain data, and co-founded NearMeNow, a distributed real-time location platform.

More About Me →

Selected Research

View All →

A Unified Bias Variance Analysis of GRPO Style Operators

Independent AI Researcher, 2025 - Present

Manuscript in review, target submission: ICML Workshop 2026.

Develops a unified theoretical framework for GRPO-style operators in policy optimization.
Characterizes how filtering, token weighting, and advantage normalization alter nonconvex policy optimization through explicit bias and covariance channels.
Frames practical post-training heuristics as bias-variance tradeoffs linked to predictable optimization behavior.

Research and Engineering

Research Collaborator, ElmWater AI Lab

Jun 2025 - Present

Conducted RL post-training on 32B to 70B language models in simulated decision environments.
Designed a reproducible LLM evaluation system across 12 benchmarks, experiment tracking, and systematic comparisons across training and inference settings.
Ran 100+ controlled experiments comparing 5 policy optimization methods across 8 core tasks and multiple random seeds for finance-domain problems.
Translated recent RL and LLM training papers into ablation plans and empirical tests to separate robust gains from configuration-sensitive improvements.

AI Engineer, Travelers Insurance Company

Hartford, CT · Jul 2023 - Mar 2025

Architected a distributed multi-agent LLM system for document analysis and task processing, scaling to 1,000+ complex cases daily with 85% agreement with expert decisions.
Fine tuned large language models on private domain data, improving task-specific performance and enabling production-grade internal use.

Co-Founder and Lead Software Engineer, NearMeNow

Washington, DC · Jan 2022 - May 2023

Co-founded a consumer technology startup building a distributed real-time location platform for discovering nearby activities and events.

Technical Skills

Languages

Python, C++, Java

Tools

PyTorch, JAX, XLA, HuggingFace Transformers, TRL, NumPy, experiment tracking tools