Steve Yadlowsky

Research Scientist, Google DeepMind
Language Modeling, Foundation Models, Statistics


Currently, I am working on Google DeepMind's Gemini project. My research interests are focused on understanding how data used for model training and evaluation affects the models' capabilities. Recently, I've been focused on evaluating and tracking progress in the models' capabilities, especially in factuality, mathematics and reasoning. This work has taught me a lot about the core challenges and how high quality data can improve model performance.

Before this, I worked on statistics and machine learning challenges in the area of causal inference. I focused particularly on high dimensional problems and understanding the interplay between machine learning models and the statistics of causal questions. I applied many of the approaches developed the recommender systems, A/B testing frameworks, and healthcare applications.

I finished my PhD at Stanford University in June 2020. There, I was fortunate to be advised by Dr. Sanjay Basu and Lu Tian, collaborate with John C. Duchi and his group on topics related to machine learning and stochastic optimization, and collaborate with Nigam Shah and his group on the use of causal methods in clinical informatics.


I have started a blog Treatments and Observations to help explain some valuable technical insights from my statistical research in a more pedagogical fashion than typically expected of statistics journal venues.


Ph.D., Dept. of Electrical Engineering, Stanford University
M.S., Dept. of Statistics, Stanford University
B.S., Dept. of Electrical Engineering and Computer Sciences, UC Berkeley


Submitted / Preprints

S. Yadlowsky*, L. Doshi*, N. Triperaneni*. Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models. [arXiv].

T. Cai, H. Namkoong, S. Yadlowsky. Diagnosing Model Performance Under Distribution Shift [arXiv].

N. Tripuraneni, L. Richardson, A. D'Amour, J. Soriano, S Yadlowsky. Choosing a Proxy Metric from Past Experiments. [arXiv].

S. Yadlowsky. Explaining Practical Differences Between Treatment Effect Estimators with High Dimensional Asymptotics. [arXiv].

S. Yadlowsky*, S. Fleming*, N. Shah, E. Brunskill, S. Wager. Evaluting Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects. [arXiv].

M. Oberst, A. D'Amour, M. Chen, Y. Wang, D. Sontag, S. Yadlowsky. Bias-robust Integration of Observational and Experimental Estimators. [arXiv].


D. Bruns-Smith, A. D'Amour, A. Feller, S. Yadlowsky. Tailored Overlap for Learning Under Distribution Shift. DistShift Workshop at NeurIPS, 2022. [OpenReview].

T. Sellam*, S. Yadlowsky*, I. Tenny*, J. Wei, N. Saphra, A. D'Amour, T. Linzen, J. Bastings, I. Turc, J. Eisenstein, D. Das, E. Pavlick. The MultiBERTs: BERT Reproductions for Robustness Analysis. International Conference on Learning Representations (ICLR), 2022. [arXiv]. [OpenReview].

A. D'Amour, K. Heller, D. Moldovan, et al.. Underspecification Presents Challenges for Credibility in Modern Machine Learning. Journal of Machine Learning Research (JMLR), 2022. [arXiv].

E. Steinberg, N. Ignatiadis, S. Yadlowsky, Y. Xu, N.H. Shah. Using public clinical trial reports to evaluate observational study methods. [arXiv]. BMC Medical Research Methodology, 2023.

S. Yadlowsky, H. Namkoong, S. Basu, J.C. Duchi, L. Tian. Bounds on the Conditional and Average Treatment Effect with Unobserved Confounding Factors. Annals of Statistics, 2022. [arXiv].

S. Yadlowsky, T. Yun, C. McLean, A. D'Amour. SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Neural Information Processing Systems (NeurIPS), 2021. [arXiv] [code].

V. Veitch, A. D'Amour, S. Yadlowsky, J. Eisenstein. Counterfactual Invariance to Spurious Correlations: Why and How to Pass Stress Tests. Neural Information Processing Systems (NeurIPS), 2021. [arXiv].

C. Nagpal, S. Yadlowsky, N. Rostamzadeh, K. Heller. Deep Cox Mixtures for Survival Regression. Machine Learning for Healthcare (MLHC), 2021. [arXiv]

H. Namkoong*, R. Keramati*, S. Yadlowsky*, E. Brunskill. Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding. Neural Information Processing Systems (NeurIPS), 2020. [arXiv] [code]

S. Yadlowsky, F. Pellegrini, F. Lionetto, S. Braune, L. Tian. Estimation and Validation of Ratio-based Conditional Average Treatment Effects Using Observational Data. Journal of the American Statistical Association. 28 May 2020. doi: 10.1080/01621459.2020.1772080. [online] [arXiv]

Yadlowsky S, Basu S, Tian L. A calibration metric for risk scores with survival data. Machine Learning for Healthcare, 2019. [code] [pdf].

S. Kashyap S, S. Gombar S, S. Yadlowsky, A. Callahan, J. Fries, B.A. Pinsky, N.H. Shah. Measure what matters: Counts of hospitalized patients are a better metric for health system capacity planning for a reopening. Journal of the American Medical Informatics Association, 17 June 2020. doi: 10.1093/jamia/ocaa076. [medrxiv] [online]

S. Yadlowsky S, R.A. Hayward, J.B. Sussman, R.L. McClelland, Y. Min, S. Basu. Clinical Implications of Revised Pooled Cohort Equations for Estimating Atherosclerotic Cardiovascular Disease Risk. Ann Intern Med. 5 June 2018. doi: 10.7326/M17-3011.

T. Hashimoto, S. Yadlowsky, and J. Duchi. Reducing optimization to repeated classification. Artificial Intelligence and Statistics, 21st International Conference (AISTATS), 2018.

H. Namkoong, A. Sinha, S. Yadlowsky, and J. Duchi. Adaptive Sampling Probabilities for Non-Smooth Optimization. International Conference on Machine Learning, Proceedings of the 34th, 2017. [
code] [abstract] [pdf].

S. Yadlowsky, J. Thai, C. Wu, A. Pozdnukhov, and A. Bayen. Link Density Inference from Cellular Infrastructure. Transportation Research Board (TRB) 94th Annual Meeting, Proceedings of, 2015.

S. Yadlowsky, P. Nakkiran, J. Wang, R. Sharma, and L. El Ghaoui. Iterative Hard Thresholding for Keyword Extraction from Large Text Corpora. Machine Learning and Applications (ICMLA), 14th International Conference on, 2014. [code] [abstract] [pdf].

C. Wu, J. Thai, S. Yadlowsky, A. Pozdnukhov, and A. Bayen. Cellpath: fusion of cellular and traffic sensor data for route flow estimation via convex optimization. Transportation and Traffic Theory, 21st International Symposium on, 2014.

J. Thai, C. Wu, S. Yadlowsky, A. Pozdnukhov, and A. Bayen. Solving simplex-constrained programs with efficient projections via isotonic regression. Poster presented at Bay Area Machine Learning Symposium, 2014.

* Asterisk indicates co-first authorship.