Research on the methods to retrieve continuous spatial distribution of PM2.5/10 based on machine learning and satellite imagery
： 2019 - 11 - 09
： 2019 - 12 - 19
： 2019 - 12 - 23
121 0 0

Abstract & Keywords
Abstract: Background, aim, and scope In the past decades, the rapid industrialization and urbanization have increased the environmental burden in China. PM2.5/10 (incl. PM2.5 and PM10), which have significant impacts on human health, have become the primary pollutant affecting ambient air quality in most large cities in most large cities of North China. How to efficiently as well as accurately obtain the temporal and spatial distribution of PM2.5/10 concentration has become a popular research topic. Due to the uneven and sparse distribution of ground monitoring stations, it is difficult to get the accurate continuous distributions of PM2.5/10 concentration for one whole research area by the means of spatial interpolation and/or numerical simulation based on the discrete values from the ground monitoring stations. Taking advantages of high time effect, wide coverage, high load ability and robust characteristics, the retrieval of the PM2.5/10 concentrations based on the satellite imagery has become more and more valuable and popular. In the literatures published so far, the data of aerosol optical depth (AOD) are often employed to retrieve the PM2.5/10 concentrations by implementing linear and/or nonlinear regression analysis methods. However, the processes to calculate the AOD necessitate some kinds of special parameters, which are hard to be calculated and could be distinct in different research areas, i.e. there is still a lack of generic and robust methods/models to retrieve the PM2.5/10 concentrations from the satellite images. Due to the ill-posed issues mentioned above, this article is dedicated to developing a generic model to obtain the continuous spatial distributions of the PM2.5/10 concentration from satellite imagery based on the method of Multilayer Back Propagation Neural-network. Materials and methods In this research, the platform of Google Earth Engine (GEE) has been employed to acquire the large amount sample data of (a) Landsat8 OLI remote sensing images with the spatial resolution of 30 m, (b) the spatial parameters such as latitude, longitude and altitude, (c) the meteorological parameters of barometric pressure, relative humidity, temperature, wind direction, wind speed, etc., and (d) PM2.5/10 concentrations from ground monitoring stations. Hereinto, the data of (a), (b) and (c) can be treated as the ‘input’, while the data of (d) is the ‘output’. In order to find the optimal combination of the input parameters for the best ‘retrieval’ performances, the alternative ‘input’ parameters have been categorized into different groups from two the aspects of ‘influence factor’ and ‘Retroactive time’, which were stepwise employed by the proposed model for the neural-network training. Results Taking Beijing as the research area, the retrieval precisions, measured by R2 , have reached 0.814 and 0.796 for the PM2.5 and PM10 respectively. Simultaneously, the RMSE reached 19.21 μg/m3 and 28.31 μg/m3. The retrieval results of the proposed model have been compared with the results calculated by Kriging interpolation. Their general spatial distribution characteristics are consistent to each other— the higher PM2.5/10 concentrations occurred in the south, while the lower values are located in the north of Beijing, China. Discussion The retrieval accuracy of the proposed model is satisfactory and higher than many established models based on the AOD method. To make the validation analysis, the proposed model has been implemented to obtain the continuous spatial distribution graphs of the PM2.5/10 concentrations in Beijing, which are significantly better than that of Kriging interpolations in terms of resolution and clarity. Furthermore, as the Landsat 8 OLI are taken as the ‘basis’ for the retrievals, the distribution graphs of PM2.5/10 concentrations generated in this article have a much higher spatial resolution than many other works. Conclusions The proposed model based on Multilayer Back Propagation Neural-network has yielded considerable PM2.5/10 retrieval performances in terms of high accuracy and strong reliability. Meanwhile, due to the fact that all the data employed by the proposed model can cover the whole area of mainland China and are opened for the public uses, the proposed model has revealed strong robustness and generic nature as well. Recommendations and perspectives The proposed model provides a new and generic method to retrieve the PM2.5/10 concentrations from Landsat 8 OLI images. The high retrieval performance and generic nature indicate that the proposed model in this article can be widely implemented to calculate the continuous spatial distribution of PM2.5/10 with high resolution in various areas. Rather than a significant research prototype, the proposed model has gained the capability to be widely utilized in real-world applications.
Keywords: machine learning; satellite imagery; PM2.5/10 retrieval; continuous spatial distribution

1   数据的收集与预处理
1.1   数据收集

（1）Landsat 8 OLI卫星遥感影像数据

Fig. 1 Distribution of CEMC and CMA stations in Beijing
（2）CEMC的空气质量监测数据

（3）CMA通过NOAA公布的中国大陆地区气象监测数据

1.2   数据预处理
（1）选取研究区内的Landsat 8 OLI卫星遥感影像，进行正射校正和空间位置配准；提取遥感影像中各波段的波段反射率，同时对大气层太阳辐射值进行反射补偿修正，并根据其中的相关波段进行植被指数NDVI的计算。
（3）由于CMA站点与CEMC站点的空间分布位置不同，本文采用邻近分析算法NAA（Near Analysis Algorithm）实现了气象参数数据与PM2.5/10监测数据的自动匹配。具体计算过程为：以CEMC监测站为中心建立缓冲区，并在缓冲区内进行Landsat8 OLI影像各波段的反射率的提取以及根据相关波段进行NDVI的计算；计算缓冲区内各波段的平均反射率与平均NDVI，并将其赋予相对应的CEMC站点；再结合卫星的过境时间以获取与Landsat 8卫星过境时刻相吻合的CEMC和CMA地面观测数据，从而共同构成PM2.5/10反演模型的主要训练数据集。此外，空间点的高程在反演模型中也将被予以考虑。
2   反演模型的建立

Fig. 2 The workflow of the PM2.5/10 retrieval model
2.1   基于多层BPN网络建立基础的PM2.5/10反演模型

Fig. 3 Schematic diagram of the PM2.5/10 retrieval model based on Back Propagation Neural-network

（1）
（2）

2.2   探寻PM2.5/10反演模型输入参数的最优化组合

2.2.1   在影响因素维度上的寻优

（a）遥感影像中与PM2.5/10强相关波段的反射率，如：Landsat 8 OLI遥感影像中的蓝波段和红波段等，在图4中简称为“强相关波段”。该组数据是遥感影像的PM2.5/10浓度反演运算的基础，在寻优过程中将会被首先并且自始至终的予以考虑。
（b）遥感影像中其他波段的反射率以及根据遥感影像解析所得的植被覆盖指数NDVI、地表覆盖类型等信息，在图4中简称为“其他波段”。
（c）空间特征参数，包括经度、维度、高程等，在图4中简称为“空间参数”。
（d）气象条件参数，包括温度、气压、湿度、风速、风向、降水等，在图4中简称为“气象参数”。

Fig. 4 Different combinations of the input parameters of the proposed PM2.5/10 retrieval model
2.2.2   在前溯时间维度上的寻优

3   结果与分析

（3）
（4）

Fig. 5 The scatter plot between the retrieved and monitoring values: (a) the scatter plot between the retrieved and monitoring values of PM2.5 concentrations; (b) the scatter plot between the retrieved and monitoring values of PM10 concentrations

Fig. 6 Spatial distribution of the PM2.5/10 concentration (μg/m3) in Beijing at UTC 2:53 on December 30, 2016: (a) retrieval result of PM2.5; (b) PM2.5 distribution calculated by Kriging Interpolation; (c) retrieval result of PM10; (d) PM2.5 distribution calculated by Kriging Interpolation

Fig. 6 Spatial distribution of the PM2.5/10 concentration (μg/m3) in Beijing at UTC 2:53 on December 30, 2016: (a) retrieval result of PM2.5; (b) PM2.5 distribution calculated by Kriging Interpolation; (c) retrieval result of PM10; (d) PM2.5 distribution calculated by Kriging Interpolation

 类别Category 最小值Min/(μg/m3) 最大值Max/(μg/m3) 均值Mean/(μg/m3) (a)和(b)之间的R2R2 between (a) and (b) (a) PM2.5 Kriging 插值Kriging interpolation of PM2.5 44.31 418.68 200.99 0.804 (b) PM2.5 模型反演Retrieval results of PM2.5 30.88 443.55 230.44 (a) PM10 Kriging 插值Kriging interpolation of PM10 48.77 451.34 230.11 0.789 (b) PM10 模型反演Retrieval results of PM10 22.34 482.57 250.13

4   结论

Engel-Cox J A, Holloman C H, Coutant B W, et al. 2004. Qualitative and quantitative evaluation of MODIS satellite sensor data for regional and urban scale air quality [J]. Atmospheric Environment, 38(16): 2495–2509.
Kumar N. 2010. What can affect AOD–PM2.5 association? [J]. Environmental Health Perspectives, 118(3): A109-A110.
Koelemeijer R B A, Homan C D, Matthijsen J. 2006. Comparison of spatial and temporal variations of aerosol optical thickness and particulate matter over Europe [J]. Atmospheric Environment, 40(27): 5304–5315.
Liu Y, Franklin M, Kahn R, et al. 2007. Using aerosol optical thickness to predict ground-level PM2.5 concentrations in the St. Louis area: A comparison between MISR and MODIS [J]. Remote Sensing of Environment, 107(1/2): 33–44.
Neruda R, Stedry A, Drkosova J. 2001. Kolmogorov learning for feedforward networks [C]// IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222), Washington, DC, USA. New York, USA: IEEE. DOI: 10.1109/IJCNN.2001.938995.
Wu Y R, Guo J P, Zhang X Y, et al. 2012. Synergy of satellite and ground based observations in estimation of particulate matter in eastern China [J]. Science of the Total Environment, 433: 20–30.

ZHANG Meng

ZHANG Bo

National Natural Science Foundation of China (41871315)

Journal of Earth Environment