Can AI models reliably forecast extreme weather events?

Improvements in weather forecasting rank high among science’s success stories of the twentieth century¹. Back in the 1970s, there were four tropical cyclones that killed tens of thousands or even hundreds of thousands of people, whereas today these storms rarely cause more than a few dozen deaths.

It was also in the 1970s that there was a turning point, when meteorological agencies around the world started adopting physics-based numerical weather-prediction models. These simulate the atmosphere by feeding worldwide observational data into equations grounded in the fundamental laws of motion and thermodynamics. The resulting improvements in forecast accuracy enabled timely evacuation and adequate preparation before a storm hit.

We need a global assessment of avoidable climate-change risks

But this well-established system is now being disrupted by the arrival of weather models based on artificial intelligence, which promise to speed up forecasts. Unlike conventional models, which solve complex physical equations step by step across millions of grid points, AI models map current weather conditions directly to a likely future state, using algorithms that have been trained on past weather data. Most of the heavy computing happens during the training, so generating an AI-based forecast mainly involves passing the observational data through layers of simple arithmetic operations — such as multiplication and addition — which modern computers can perform quickly.

As a result, a 14-day global AI weather forecast can be produced two hours earlier than can one by a physics-based system — a potentially crucial margin when organizing evacuations. That speed advantage might tempt forecasters to vote with their feet and rely mainly on AI guidance. But there is a catch: as yet, scientists do not know how reliable AI-based predictions are when it comes to rare, extreme weather events.

Physics-based forecasts should remain valid even as the climate changes; AI systems, by contrast, are trained on historical data and could falter when confronted with events that differ radically from anything they have seen previously.

Establishing the accuracy and reliability of AI-based models is becoming more urgent because several agencies, including the European Centre for Medium-Range Weather Forecasts based in Reading, UK, have already begun integrating AI into their operational forecasting systems. Here, we highlight concerns over adopting AI in meteorology, and call on the weather and climate community to set clear standards, starting with agreed data sets, for testing out-of-sample extreme-event predictions objectively.

The dilemma

National meteorological services around the world face a dilemma: AI forecasting systems are cheaper to run, but there is no agreed method for a systematic evaluation of how well they fare compared with physics-based counterparts.

Researchers urgently need a benchmarking standard to assess the ability of AI models. Several studies have examined their performance on specific hazards. For example, although leading AI models forecast the tracks and, to some extent, the intensity of typical tropical cyclones well, their skill drops for storms with no precedent in the training set². As for temperature extremes, some AI and hybrid models can broadly reproduce the frequency and spatial patterns of historical heatwaves and cold spells that occurred outside the period on which they were trained, albeit with regional biases³. But AI systems also tend to underestimate the intensity and frequency of record-breaking heat, cold and wind events compared with a leading physics-based model⁴.

Taken together, these results indicate that conclusions about AI performance in weather forecasting remain highly sensitive to how extremes are defined, which hazards are considered and where the extreme events occur. This underscores the need for consensus-driven, standardized evaluation protocols.

How AI is improving climate forecasts

Essentially, before weather agencies adopt AI models, the predictive skill of such models on a range of hazardous events — from heatwaves and heavy rainfall to major storms — must pass a defined minimum standard. We therefore propose a framework for training all future AI systems, one that deliberately withholds a designated set of ‘iconic’ extreme events, which are reserved solely for testing.

This AI Retraining Without Iconic Events (AIRWIE) protocol would require the meteorological community to agree on which high-impact events constitute a rigorous benchmark, ensuring that any model is evaluated against the same out-of-sample extremes before being deployed operationally by a public forecasting agency.

Can AI models reliably forecast extreme weather events?

The dilemma

Community standards

Support

Legal & Privacy

Services

Newsletter

Can AI models reliably forecast extreme weather events?

The dilemma

Community standards

Oscars Shine with Conan O’Brien (Again)

US citizens: Trump had no ‘backup’ plan to help them leave Middle East after Iran strike | US-Israel war on Iran

You may also like

Leave a Comment Cancel Reply

Support

Legal & Privacy

Services

Newsletter