About

Hello :) I am Erica, currently a Ph.D. candidate in Operations Research in the Department of Management Science & Engineering (MS&E) at Stanford University, co-advised by Prof. Jose H. Blanchet (MS&E) and Prof. Mert Pilanci (Electrical Engineering). My work has been shaped by close collaborations with Prof. James Zou and Prof. Robert Tibshirani. I am grateful to be supported by the Stanford Graduate Fellowship (SGF) in Sciences and Engineering via the Koret Foundation and a PhD Fellowship from Jump Trading in the AI/ML track. At Stanford, I am affiliated with Stanford Center for AI Safety and Advanced Financial Technologies Laboratory (AFTLab).

As part of my Ph.D., I completed a co-enrolled M.S. degree in Electrical Engineering, specializing in Control & Optimization. Prior to Stanford, I earned a dual B.A. in Mathematics and Statistics from Columbia College, Columbia University, graduating summa cum laude with honors from both departments.

This past summer (2025), I interned as an Applied Scientist working on Agentic AI with Amazon Science at the Bellevue office. I am excited to share that I will be joining Two Sigma this summer (2026) as a Quantitative Research Intern in AI and ML at the New York headquarters.

News

Excited to share that I will be giving a talk at Google Ventures <> Google DeepMind on benchmarking for verifier-hard domains at the Google Ventures headquarters in San Francisco on May 20th!

Check out out latest work: TERMS-Bench and leaderboard! TERMS-Bench introduces a new way to evaluate agentic capabilities in non- or semi-verifiable domains, where structure is loose and no native verifier exists: constructing the environment itself as the verifier. We focus on agentic negotiation, including commercial extensions such as stateful agentic procurement chains, and evaluate the most capable high-reasoning models from major providers as of May 2026.

Excited to share that I received the Jump Trading Fellowship in the AI/ML track (2026), supporting my research on reliable modern learning and agentic AI systems.

Check out our latest work: Statsformer! Statsformer bridges LLM guidance with statistical rigor, delivering provable safety guarantees that mitigate performance degradation from LLM hallucinations while consistently outperforming strong AutoML baselines (e.g., AutoGluon, LLM-Agent–style systems).

Research

Beginning as a theorist and statistician, I draw on classical tools from optimization and probability to tackle modern challenges in machine learning, particularly in large-scale, high-dimensional settings where sample complexity and statistical rigor matter most.

Currently, my research bridges foundation models and agentic systems with statistical rigor and safety. I develop LLM-integrated learning and decision-making agentic systems with formal guarantees that mitigate failure modes such as hallucination. My goal is to move beyond heuristic LLM augmentation toward systems that are robust, interpretable, and provably safe.

Philosophically, I’m inspired by mathematician Hans Hahn’s view of mathematics as a precise, elegantly constructed conceptual framework: one that enables us to abstract information and perform tautological transformations to uncover fundamental laws governing our world [1]. As I continue my journey as a researcher, I hope to uncover more of these hidden structures within learning systems through the lenses of optimization and statistical theory and push the frontiers of what we can rigorously understand and design in machine learning.

Feel free to reach out if you’re interested in my work 🙂

Selected Industry Projects

Negotiation Agent for the Amazon Marketplace
Patent pending · Amazon Science, Bellevue · 2025
Designed and built end-to-end agentic AI system for strategic price negotiation in a real-world marketplace environment under business constraints.
Position: Applied Scientist Intern · Role: Research and system development lead · Status: Pilot testing
[Post]

Open-Source Projects

TERMS-Bench: A Diagnostic Benchmark for LLM Negotiation Agents
Open-source · Evaluation benchmark · HAI <> Stanford Engineering · 2026
Creator; development and engineering lead.
[Leaderboard] · [PDF] · [Codes - Coming soon]
OpenThoughts-Agent: Data Receipe for Agents
Open-source · Benchmark · Stanford, Daytona, Oumi, Bespoke Labs, and Laude Institute · 2026
Contributor; data creation and curation.
[Leaderboard] · [Codes]

Scholarly Works

TERMS-Bench: Diagnosing LLM Negotiation Agents Beyond Deal Rate
Erica Zhang, Fangzhao Zhang, Aneesh Pappu, Batu El, Jose Blanchet, Suan Athey, Jiashuo Liu, James Zou
arXiv Preprint (2026)
[PDF] · [arXiv] · [Codes - Coming soon] · [Leaderboard]
Optimizer-Induced Mode Connectivity: From AdamW to Muon
Fangzhao Zhang^*, Sungyoon Kim^*, Erica Zhang, Yiqi Jiang, Mert Pilanci
arXiv Preprint (2026)
[PDF] · [arXiv] · [Codes]
OpenThoughts-Agent: Data Recipes for Agentic Models
Negin Raoof, Richard Zhuang, Marianna Nezhurina, Etash Kumar Guha, Atula Tejaswi, Ryan Marten, Charlie F. Ruan, …, Erica Zhang, …, Jenia Jitsev, Alex Dimakis, Benjamin Feuer, Ludwig Schmidt
arXiv Preprint (2026)
[PDF - Coming soon] · [arXiv - Coming soon] · [Codes]
Learning When to Trust LLM Priors: A Validated Framework for Semantic Prior Integration
Erica Zhang^*, Naomi Sagan^*, Danny Tse, Fangzhao Zhang, Mert Pilanci, Jose Blanchet
arXiv Preprint (2026)
[PDF] · [arXiv] · [Codes]
Active Learning of Deep Neural Networks via Gradient-Free Cutting Planes
Erica Zhang^*, Fangzhao Zhang^*, Mert Pilanci
International Conference on Machine Learning (ICML), 2025
[PDF] · [arXiv] · [Codes]
LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization
Erica Zhang^*, Naomi Sagan^*, Ryan Goto^*, Jurik Mutter, Nick Phillips, Ash Alizadeh, Kangwook Lee, Jose Blanchet, Mert Pilanci, Robert Tibshirani
arXiv Preprint (2025)
[PDF] · [arXiv] · [Codes]
Empirical martingale projections via the adapted Wasserstein distance
Jose Blanchet, Johannes Wiesel, Erica Zhang, Zhenyuan Zhang
Annals of Applied Probability, AAP2239, 2025
[PDF] · [arXiv] · [Codes]
HieroLM: Egyptian Hieroglyph Recovery with Next Word Prediction Language Model
Xuheng Cai, Erica Zhang
LaTeCH-CLfL 2025 @ NAACL 2025
[PDF] · [arXiv] · [Codes]
An optimal transport-based characterization of convex order
Johannes Wiesel, Erica Zhang
Dependence Modeling 11 (1)
[PDF] · [arXiv] · [Codes]
Convex Order and Arbitrage
Erica Zhang
arXiv Preprint
[PDF] · [arXiv]

Invited Talks

Learning When to Trust LLM Priors: Reliable Prediction with Statistical Guarantees
Jump AI Symposium, Jump Trading, New York City · May 28, 2026
Benchmarking and Evaluation for Semi-Verifiable Domains: A Case Study of Agentic Negotiation
Frontier Research Series, Google Ventures, San Francisco · May 20, 2026
TERMS-Bench: Diagnosing LLM Negotiation Agents Beyond Deal Rate
Open Model Benchmarks, Google DeepMind, San Francisco · May 6, 2026 · [Slides]

Academic Service

Co-chair & Organizer, Optimization under Uncertainty cluster at the INFORMS Annual Meeting, San Francisco, November 1–4, 2026.
Referee, NeurIPS, 2026.
Referee, Management Science, 2025.

Honors

Jump Trading Fellowship in AI/ML, Jump Trading (2026).
Stanford Graduate Fellowship (SGF) in Sciences and Engineering, Stanford University (2023)
Phi Beta Kappa at Columbia College Delta Chapters, Columbia University (2023)
Departmental Honors in Mathematics, Columbia University (2023)
Departmental Honors in Statistics, Columbia University (2023)
Nexus Fellowship, The D. E. Shaw Group (2021)

References

[1] Hahn, Hans (1933). Logic, Mathematics and Knowledge of Nature (Logik, Mathematik und Naturerkenntnis). In B. McGuinness (Ed.), Unified Science: The Vienna Circle Monograph Series (pp. 24–45). Dordrecht: Springer. Springer Link.

Erica Zhang