Identification of physical processes via data driven methods
Abstract
Extracting governing equations from data can be viewed as reverse engineering of Nature- using data to identify the physical laws/models. This approach is crucial for fields where data is abundant ( such as geophysical flows, finance, and neuroscience) but the physical laws based on the first principles are not available. In recent years, the use of machine learning (ML) methods complemented the need for formulating mathematical models through the application of data analysis algorithms that allow accurate estimation of observed dynamics by learning automatically from the given observations. The neural networks and symbolic regression (SR) based approaches are the most popular ML frameworks used to learn the underlying physical process by only the observing data. While neural network approaches have shown great promise, its black-box nature makes it difficult to interpret the learned models. On the other hand, symbolic regression algorithms are capable of learning/finding an analytically tractable function in symbolic form. Hence to address the functional expressibility, a key limitation of the black-box machine learning methods, this study has explored the use of symbolic regression approaches for identifying relations and operators that accurately represent the underlying physical processes. This study demonstrates the use of an evolutionary algorithm called gene expression programming (GEP) and a sparse optimization algorithm called sequential threshold ridge regression (STRidge) in discovering physical models. The effectiveness of these algorithms is demonstrated on four different applications: (1) partial differential equation (PDE) discovery, (2) truncation error analysis, (3) hidden physics discovery and (4 ) discovering subgrid-scale closure models. This study shows the GEP and STRidge algorithms are able to distill various linear/nonlinear PDEs, truncation error terms and unknown source terms of 1D and 2D PDEs. Furthermore, the classical Smagorinsky model is identified for subgrid-scale (SGS) closure from an array of tailored features in solving the 2D Kraichnan turbulence problem. Our results demonstrate the huge potential of these techniques in distilling complex nonlinear physics models from only observing the data. Furthermore, this study reveals the importance of feature selection/feature engineering and embedding the prior knowledge about the unknown dynamical system in terms of invariances for identifying models.
Collections
- OSU Theses [15752]