Obradovic, Zoran; Vucetic, Slobodan; Latecki, Longin; Mennis, Jeremy (Temple University. Libraries, 2011)
      In recent years many remote sensing instruments of various properties have been employed in an attempt to better characterize important geophysical phenomena. Satellite instruments provide an exceptional opportunity for global long-term observations of the land, the biosphere, the atmosphere, and the oceans. The collected data are used for estimation and better understanding of geophysical parameters such as land cover type, atmospheric properties, or ocean temperature. Achieving accurate estimations of such parameters is an important requirement for development of models able to predict global climate changes. One of the most challenging climate research problems is estimation of global composition, load, and variability of aerosols, small airborne particles that reflect and absorb incoming solar radiation. The existing algorithm for aerosol prediction from satellite observations is deterministic and manually tuned by domain scientist. In contrast to domain-driven method, we show that aerosol prediction is achievable by completely data-driven approaches. These statistical methods consist of learning of nonlinear regression models to predict aerosol load using the satellite observations as inputs. Measurements from unevenly distributed ground-based sites over the world are used as proxy to ground-truth outputs. Although statistical methods achieve better accuracy than deterministic method this setup is appropriate when data are independently and identically distributed (IID). The IID assumption is often violated in remote sensing where data exhibit temporal, spatial, or spatio-temporal dependencies. In such cases, the traditional supervised learning approaches could result in a model with degraded accuracy. Conditional random fields (CRF) are widely used for predicting output variables that have some internal structure. Most of the CRF research has been done on structured classification where the outputs are discrete. We propose a CRF model for continuous outputs that uses multiple unstructured predictors to form its features and at the same time exploits structure among outputs. By constraining the feature functions to quadratic functions of outputs, we show that the CRF model can be conveniently represented in a Gaussian canonical form. The appeal of proposed Gaussian Conditional Random Fields (GCRF) model is in its conceptual simplicity and computational efficiency of learning and inference through use of sparse matrix computations. Experimental results provide strong evidence that the GCRF achieves better accuracy than non-structured models. We improve the representational power of the GCRF model by 1) introducing the adaptive feature function that can learn nonlinear relationships between inputs and outputs and 2) allowing the weights of feature functions to be dependent on inputs. The GCRF is also readily applicable to other regression applications where there is a need for knowledge integration, data fusion, and exploitation of correlation among output variables.