Skip to main content

Machine learning to derive surface concentrations of nitrogen dioxide

News flash intro
Near-surface nitrogen dioxide (NO2) is of great concern due to its impact on air quality and human health. Machine learning (ML) is an innovative approach to establish a nonlinear mapping between surface NO2 distributions and geo-physical predictors at high resolution and accuracy. However, it remains challenging to apply ML to produce surface NO2 operational products with realistic spatial patterns and uncertainty quantification. We are exploring a systematical scheme for a stable ML-based surface NO2 product provision.
Body text

Inferring near-surface NO2 concentrations from atmospheric columns observed by satellites is essential for assessing air quality and health risks. This requires the construction of a model using satellite and ground observations and other ancillary data sets. Although this work is primarily carried out using physical models (assimilating observations) or empirical statistical models, these methods have to make trade-offs between computational efficiency, resolution, and accuracy.

This problem is mitigated by machine learning (ML), which has a superior ability to construct complex non-linear mappings from drivers to targets. Meanwhile, ML has been widely studied in various disciplines with the rapid development of computing power and big data.

Currently, ML has been demonstrated in many studies for its ability to estimate the spatiotemporal distribution of surface NO2 at high resolution. However, it remains challenging to use ML to produce surface NO2 products due to unstable prediction, lack of uncertainty assessment, and weak physical constraints.

For stable ML-based surface NO2 product production

Our study aims to address these challenges and explore a systematic and practical scheme for stable ML-based surface NO2 product production, in the framework of the Terrascope project. This work is ongoing and the research scheme is outlined below:

  • Identify influential predictors and explore the appropriate data processing method.
  • Investigate the behavior and performance of different tree-based and neural network-based ML models. Develop the ML algorithm for surface NO2 estimation by designing the structure and loss function.
     
  • Develop uncertainty quantification methods for ML models and provide the prediction interval for the models.
  • Examine the reliability of model results and proceed with model interpretation.
  • Conduct health impact assessment based on model prediction.
     
  • Publish the ML-generated surface NO2 product for public access.
  • Test algorithms and schemes on the Belgian domain and extend the study to other European countries.
     

Overall, this study aims to explore how ML models can improve the prediction of surface NO2 and provide corresponding products, which would provide a perspective for practical applications of ML methods in atmospheric science. Furthermore, we expect that the methodology in this work could be further exploited for the prediction of other atmospheric components.

Figure 2 body text
Figure 2 caption (legend)
The workflow for mapping surface NO2 distributions using machine learning methods. This process includes data preparation, model training and testing, uncertainty quantification, model interpretation, mapping of surface NO2 and providing corresponding prediction intervals.
Publication date