Challenges
To date, there is an abundance of scientific data for many environmental and energy-related phenomena derived from different sensors, both in situ and remote. Some datasets are purely spatial (e.g., mineral deposits, topographic variables, aquifer permeability), while others involve a dynamic evolution (e.g., meteorological variables, ecological and environmental pollution indicators, groundwater levels). The data volume is often large, and in certain cases their velocity is high, e.g., in sequences of satellite images. Volume and velocity are common bottlenecks in data analysis and modeling. Space-time (henceforward, ST) data are further complicated by (i) correlations between different spatial locations and time instants, (ii) sampling over irregular grids—typical for ground-based data, and (iii) multivariate dependence (cross-correlations). Models need to effectively address these features in order to extract useful information from the data.
Efficient processing algorithms are necessary to fill gaps in ST datasets (e.g., sparse hydrological measurements), to forecast future values of critical variables (e.g., spatial precipitation patterns, short-term wind energy potential), and to assess uncertainty in climate patterns and natural resources distributions by means of simulated scenarios. The scalability of algorithms for interpolation/forecasting/simulation (IFS) with data size, the validity of underlying model assumptions, and the complexity of multivariate dependence models are open research issues. Thus, new approaches with improved computational performance and fewer parametric assumptions are needed. In addition, novel and flexible statistical tools are needed to overcome the restrictions of simplified correlation models. The advances generated by SLIMNETS will address such issues.
Goals
The overall goal of SLIMNETS is to develop novel AI concepts, methods, algorithms and software suitable for modeling large geo-referenced ST data. Such data typically represent samples (on regular lattices or irregular ST grids) of continuous, real-valued processes that exhibit different patterns of space-time dependence and cross-correlations between different components. In light of the ever-increasing availability and size of modern datasets, SLIMNETS aims to develop and implement methods that have favorable scaling of computational resources (memory and speed) with data size. ST data modeling attracts strong interest in the probability, statistics, and machine learning communities. SLIMNETS will focus on models for the analysis of geo-referenced data indexed in space and/or in space-time. SLIMNETS proposes an ambitious line of research that will exploit sparse models rooted in statistical physics. In statistics and machine learning, ST correlations are modeled by means of covariance kernels. In statistical physics, correlations are derived from local interactions, e.g., as in Gaussian field theory and the Ising model. In addition, effective theories are used to replace many-body systems by means of simpler, single-body ones. The PI has expertise in both machine learning and statistical physics and will exploit the above ideas to develop novel machine learning methods. Achieving the main goal involves the fulfillment of the following specific goals (SGs):
SG1: to formulate flexible and parametrically rich ST models for different data types (e.g., multivariate time series, planar and volumetric spatial data, and ST data on 3D and 4D grids) and sampling patterns by developing suitable interaction functions.
SG2: to develop computationally efficient AI algorithms for large, complex ST data requiring minimal user interaction. Such models can provide valuable tools suitable for automated environmental monitoring and emergency early-warning systems.
SG3: to generate models of dependence (including nonlinear interactions) between different variables (covariates) and to incorporate non-Euclidean distance measures in the interaction functions.
Main Objective
SLIMNETS aims to develop rich, flexible, and computationally tractable space-time models and methods for processing large, complex, multivariate, geo-referenced ST data. Novel predictive analysis tools will advance current capabilities for interpolation, forecasting, and simulation of geo-referenced processes. To achieve the main objective, SLIMNETS will focus on the following technical specific objectives:
O1: Enhanced parametric flexibility of local interaction functions for modeling multi-output spatiotemporal correlations using sparse precision operators with easily verifiable admissibility constraints.
O2: Ability of sparse precision operators to handle different patterns of missing data (e.g., partially sampled grids, temporal gaps, regular lattices, and unstructured grids for multiple inputs).
O3: Multivariate (multi-output) covariance kernels for vector-valued processes with easily verifiable permissibility conditions and interpretable hyperparameters.
O4: Favorable scaling (linear or sub-linear) of the computational time and memory with data size by taking advantage of suitably selected sparse representations.
O5: Accurate and robust parameter estimation methods with data-informed initialization proposals.
SLIMNETS will address these objectives by exploiting local interactions (rooted in statistical field theories) and sparsity, and by reducing complexity by means of effective-field approaches. The expected scientific outcome of SLIMNETS is powerful machine learning methods, computational algorithms and software for ST data with favorable (at most linear) scaling—with respect to the size of the data and the ST prediction grid. The anticipated advances will generate computationally efficient, predictive, and uncertainty quantification tools for large and diverse environmental, remote sensing, public health, natural resources, and climate data. Hence, SLIMNETS will have significant societal and economic impact by providing analytic tools for advanced decision support in agriculture, renewable energy and natural resources exploitation, and public health protection.
