# Surrogate modeling for subsurface flow: a new reduced-order model and error estimation procedures

## Advisor

Louis Durlofsky, primary advisor

Tapan Mukerji, advisor

Hamdi Tchelepi, advisor

## Abstract

Reservoir simulation is a critical technology for effective reservoir management. Using reservoir simulation to perform many-query analyses such as finding optimal well controls or locations, or quantifying production uncertainty, can however lead to large computational costs. This is because these analyses may require thousands of function evaluations, each of which entails one or more flow simulations. The application of approximate surrogate models, such as reduced-order models and coarse-scale models, represents a promising approach to accelerate these flow simulations. This thesis addresses the development and use of surrogate models for subsurface flow. Towards this goal, we develop a new reduced-order model and implement a machine-learning-based framework to estimate the error in the surrogate-model predictions. We first present a new reduced-order model (ROM) based on trajectory piecewise quadratic (TPWQ) approximations and proper orthogonal decomposition (POD). The method extends existing techniques based on trajectory piecewise linear (TPWL) approximations by incorporating second-derivative terms into the reduced-order treatment. Both the linear and the quadratic reduced-order methods, referred to as POD-TPWL and POD-TPWQ, entail the representation of new solutions as expansions around previously simulated high-fidelity (full-order) training solutions, along with POD-based projection into a low-dimensional space. POD-TPWQ entails significantly more offline preprocessing than POD-TPWL since it requires generating and projecting several third-order (Hessian-type) terms. The POD-TPWQ method is implemented for two-dimensional systems. Extensive numerical results demonstrate that it provides consistently better accuracy than POD-TPWL, with speedups of about two orders of magnitude relative to high-fidelity simulations for the problems considered. We demonstrate that POD-TPWQ can be used as an error estimator for POD-TPWL, which motivates the development of a trust-region-based optimization framework. This procedure uses POD-TPWL for fast function evaluations and a POD-TPWQ error estimator to determine when retraining, which entails a high-fidelity simulation, is required. The optimization results for an oil-water well control problem demonstrate the substantial speedups that can be achieved relative to optimization using only high-fidelity simulation. In the second part of the thesis, we develop a new machine-learning-based framework for estimating the error introduced by surrogate models such as ROMs and coarse-scale (upscaled) models. The framework involves the application of high-dimensional regression techniques, such as random forest, to map a large set of inexpensively-computed quantities, i.e., error indicators, to errors in the output quantities of interests (QoI). To estimate the error in the production and injection quantities, the framework corrects the solution on a well-by-well basis. Similarly, while estimating the error in the flow rates for a given production well, the framework uses feature-space partitioning to categorize the production data into different flow regimes, such as before, around, and after water breakthrough. Therefore, we construct `local' error models for each well. The methodology requires the construction of a training dataset, which involves performing flow simulations on both the high-fidelity model and the surrogate model for a relatively small number of instances of the control parameters. We first demonstrate the machine-learning-based framework on two-dimensional oil-water models with varying well-control (bottom-hole pressure) parameters. The error indicators for POD-TPWL are mined from the quantities computed while solving the underlying POD-TPWL equations. We estimate the POD-TPWL error in the production and injection quantities for an ensemble of test cases (cases that are not used to construct the training dataset). We observe a consistent improvement in the QoI accuracy by adding the estimated error back to the POD-TPWL predictions. We explore different algorithmic treatments for the machine-learning-based framework, including adding memory in the features, using classification or clustering for feature-space partitioning, and using the random forest or LASSO (least absolute shrinkage and selection operator) technique for regression. We then demonstrate the application of the machine-learning-based framework to estimate the error when using upscaled models for an oil-water problem with geological parameters (i.e., permeability realizations) as the control parameter. The upscaled models are characterized by properly computed coarse-scale transmissibilities, porosities and well indices. To generate the error indicators for the upscaled oil-water models, we solve a fine-scale transport equation with a constant-in-time velocity field to provide estimates of the neglected subgrid effects. These quantities are then correlated to the upscaling error in a training step (using random forest). We then use upscaled oil-water models, along with the error estimates, to quantify the production uncertainty in two and three-dimensional systems. The corrected upscaled solutions show consistently better accuracy than the uncorrected solutions, both in terms of key statistical quantities (such as the P10-P50-P90 results) and the realization-by-realization predictions.