
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian
Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan,
Ce Zhang, Christian Alexander Cosgrove, Christopher D Manning, Christopher Re, Diana Acosta-Navas,
Drew Arad Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao,
Jue WANG, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan
Kim, Neel Guha, Niladri S. Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Andrew Chi,
Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang,
Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, and Yuta Koreeda. Holistic
evaluation of language models. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL
https://openreview.net/forum?id=iO4LZibEqW. Featured Certification, Expert Certification.
Zachary C. Lipton and Jacob Steinhardt. Troubling trends in machine learning scholarship: Some ml papers
suffer from flaws that could mislead the public and stymie future research. Queue, 17(1):45–77, feb 2019.
ISSN 1542-7730. 10.1145/3317287.3328534. URL https://doi.org/10.1145/3317287.3328534.
Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon,
Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, et al. The data provenance initiative:
A large scale audit of dataset licensing & attribution in ai. arXiv preprint arXiv:2310.16787, 2023a.
Shayne Longpre, Gregory Yauney, Emily Reif, Katherine Lee, Adam Roberts, Barret Zoph, Denny Zhou,
Jason Wei, Kevin Robinson, David Mimno, et al. A pretrainer’s guide to training data: Measuring the effects
of data age, domain coverage, quality, & toxicity. arXiv preprint arXiv:2305.13169, 2023b.
Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-
Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, et al. A safe harbor for ai
evaluation and red teaming. arXiv preprint arXiv:2403.04893, 2024.
Shayne Longpre, Kevin Klyman, Ruth E. Appel, Sayash Kapoor, Rishi Bommasani, Michelle Sahar, Sean
McGregor, Avijit Ghosh, Borhane Blili-Hamelin, Nathan Butters, Alondra Nelson, Amit Elazari, Andrew
Sellars, Casey John Ellis, Dane Sherrets, Dawn Song, Harley Geiger, Ilona Cohen, Lauren McIlvenny,
Madhulika Srikumar, Mark M. Jaycox, Markus Anderljung, Nadine Farid Johnson, Nicholas Carlini, Nicolas
Miailhe, Nik Marda, Peter Henderson, Rebecca S. Portnoff, Rebecca Weiss, Victoria Westerhoff, Yacine Jernite,
Rumman Chowdhury, Percy Liang, and Arvind Narayanan. In-house evaluation is not enough: Towards
robust third-party flaw disclosure for general-purpose ai, 2025a. URL
https://arxiv.org/abs/2503.16861
.
Shayne Longpre, Sneha Kudugunta, Niklas Muennighoff, I-Hung Hsu, Isaac Caswell, Alex Pentland, Sercan
Arik, Chen-Yu Lee, and Sayna Ebrahimi. Atlas: Adaptive transfer scaling laws for multilingual pretraining,
finetuning, and decoding the curse of multilinguality, 2025b. URL https://arxiv.org/abs/2510.22037.
Alexandra Sasha Luccioni and Alex Hernández-García. Counting carbon: A survey of factors influencing the
emissions of machine learning. ArXiv, abs/2302.08476, 2023.
Sasha Luccioni and Theo Alves da Costa. What kind of environmental impacts are ai companies disclosing?
(and can we compare them?). In Hugging Face Blog, 2025. URL
https://huggingface.co/blog/sasha/
environmental-impact-disclosures.
Sasha Luccioni, Boris Gamazaychikov, Sara Hooker, Régis Pierrard, Emma Strubell, Yacine Jernite, and
Carole-Jean Wu. Light bulbs have energy ratings—so why can’t ai chatbots? Nature, 632(8026):736–738,
2024.
Nestor Maslej, Loredana Fattorini, Raymond Perrault, Yolanda Gil, Vanessa Parli, Njenga Kariuki, Emily
Capstick, Anka Reuel, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika,
Juan Carlos Niebles, Yoav Shoham, Russell Wald, Toby Walsh, Armin Hamrah, Lapo Santarlasci, Julia
Betts Lotufo, Alexandra Rome, Andrew Shi, and Sukrut Oak. The AI index 2025 annual report, April 2025.
Tegan McCaslin, Jide Alaga, Samira Nedungadi, Seth Donoughe, Tom Reed, Rishi Bommasani, Chris Painter,
and Luca Righetti. Stream (chembio): A standard for transparently reporting evaluations in ai model reports,
2025. URL https://arxiv.org/abs/2508.09853.
43