Home Page

Papers

Submissions

News

Editorial Board

Special Issues

Open Source Software

Proceedings (PMLR)

Data (DMLR)

Transactions (TMLR)

Search

Statistics

Login

Frequently Asked Questions

Contact Us



RSS Feed

ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation

Sungduk Yu, Zeyuan Hu, Akshay Subramaniam, Walter Hannah, Liran Peng, Jerry Lin, Mohamed Aziz Bhouri, Ritwik Gupta, Björn Lütjens, Justus C. Will, Gunnar Behrens, Julius J. M. Busecke, Nora Loose, Charles I Stern, Tom Beucler, Bryce Harrop, Helge Heuer, Benjamin R Hillman, Andrea Jenney, Nana Liu, Alistair White, Tian Zheng, Zhiming Kuang, Fiaz Ahmed, Elizabeth Barnes, Noah D. Brenowitz, Christopher Bretherton, Veronika Eyring, Savannah Ferretti, Nicholas Lutsko, Pierre Gentine, Stephan Mandt, J. David Neelin, Rose Yu, Laure Zanna, Nathan M. Urban, Janni Yuval, Ryan Abernathey, Pierre Baldi, Wayne Chuang, Yu Huang, Fernando Iglesias-Suarez, Sanket Jantre, Po-Lun Ma, Sara Shamekh, Guang Zhang, Michael Pritchard; 26(142):1−85, 2025.

Abstract

Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML emulators. However, these hybrid physics-ML simulations require domain-specific data and workflows that have been inaccessible to many ML experts. This paper is an extended version of our NeurIPS award-winning ClimSim dataset paper. The ClimSim dataset includes 5.7 billion pairs of multivariate input/output vectors spanning ten years at high temporal resolution, capturing the influence of high-resolution, high-fidelity physics on a host climate simulator's macro-scale state. In this extended version, we introduce a significant new contribution in Section 5, which provides a cross-platform, containerized pipeline to integrate ML models into operational climate simulators for hybrid testing. We also implement various baselines of ML models and hybrid simulators to highlight the ML challenges of building stable, skillful emulators. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res, also in a low-resolution version at https://huggingface.co/datasets/LEAP/ClimSim_low-res and an aquaplanet version at https://huggingface.co/datasets/LEAP/ClimSim_low-res_aqua-planet) and code (https://leap-stc.github.io/ClimSim and https://github.com/leap-stc/climsim-online) are publicly released to support the development of hybrid physics-ML and high-fidelity climate simulations.

[abs][pdf][bib]        [code]
© JMLR 2025. (edit, beta)

Mastodon