Abstract: Background, aim, and scope In the past decades, the rapid industrialization and urbanization have increased the environmental burden in China. PM2.5/10 (incl. PM2.5 and PM10), which have significant impacts on human health, have become the primary pollutant affecting ambient air quality in most large cities in most large cities of North China. How to efficiently as well as accurately obtain the temporal and spatial distribution of PM2.5/10 concentration has become a popular research topic. Due to the uneven and sparse distribution of ground monitoring stations, it is difficult to get the accurate continuous distributions of PM2.5/10 concentration for one whole research area by the means of spatial interpolation and/or numerical simulation based on the discrete values from the ground monitoring stations. Taking advantages of high time effect, wide coverage, high load ability and robust characteristics, the retrieval of the PM2.5/10 concentrations based on the satellite imagery has become more and more valuable and popular. In the literatures published so far, the data of aerosol optical depth (AOD) are often employed to retrieve the PM2.5/10 concentrations by implementing linear and/or nonlinear regression analysis methods. However, the processes to calculate the AOD necessitate some kinds of special parameters, which are hard to be calculated and could be distinct in different research areas, i.e. there is still a lack of generic and robust methods/models to retrieve the PM2.5/10 concentrations from the satellite images. Due to the ill-posed issues mentioned above, this article is dedicated to developing a generic model to obtain the continuous spatial distributions of the PM2.5/10 concentration from satellite imagery based on the method of Multilayer Back Propagation Neural-network. Materials and methods In this research, the platform of Google Earth Engine (GEE) has been employed to acquire the large amount sample data of (a) Landsat8 OLI remote sensing images with the spatial resolution of 30 m, (b) the spatial parameters such as latitude, longitude and altitude, (c) the meteorological parameters of barometric pressure, relative humidity, temperature, wind direction, wind speed, etc., and (d) PM2.5/10 concentrations from ground monitoring stations. Hereinto, the data of (a), (b) and (c) can be treated as the ‘input’, while the data of (d) is the ‘output’. In order to find the optimal combination of the input parameters for the best ‘retrieval’ performances, the alternative ‘input’ parameters have been categorized into different groups from two the aspects of ‘influence factor’ and ‘Retroactive time’, which were stepwise employed by the proposed model for the neural-network training. Results Taking Beijing as the research area, the retrieval precisions, measured by R2 , have reached 0.814 and 0.796 for the PM2.5 and PM10 respectively. Simultaneously, the RMSE reached 19.21 μg/m3 and 28.31 μg/m3. The retrieval results of the proposed model have been compared with the results calculated by Kriging interpolation. Their general spatial distribution characteristics are consistent to each other— the higher PM2.5/10 concentrations occurred in the south, while the lower values are located in the north of Beijing, China. Discussion The retrieval accuracy of the proposed model is satisfactory and higher than many established models based on the AOD method. To make the validation analysis, the proposed model has been implemented to obtain the continuous spatial distribution graphs of the PM2.5/10 concentrations in Beijing, which are significantly better than that of Kriging interpolations in terms of resolution and clarity. Furthermore, as the Landsat 8 OLI are taken as the ‘basis’ for the retrievals, the distribution graphs of PM2.5/10 concentrations generated in this article have a much higher spatial resolution than many other works. Conclusions The proposed model based on Multilayer Back Propagation Neural-network has yielded considerable PM2.5/10 retrieval performances in terms of high accuracy and strong reliability. Meanwhile, due to the fact that all the data employed by the proposed model can cover the whole area of mainland China and are opened for the public uses, the proposed model has revealed strong robustness and generic nature as well. Recommendations and perspectives The proposed model provides a new and generic method to retrieve the PM2.5/10 concentrations from Landsat 8 OLI images. The high retrieval performance and generic nature indicate that the proposed model in this article can be widely implemented to calculate the continuous spatial distribution of PM2.5/10 with high resolution in various areas. Rather than a significant research prototype, the proposed model has gained the capability to be widely utilized in real-world applications.
Keywords: machine learning; satellite imagery; PM2.5/10 retrieval; continuous spatial distribution