Writings / Articles / Physics-Informed Neural Networks
Neural networks are largely black boxes in nature when they are used to model data distributions. However, there is a class of highly interpretable models that epitomizes machine learning for scientific discovery. In this article, we discuss the main ideas and collect some useful references for exploring this exciting topic where A.I. and science converges.
There is a class of neural networks that can directly incorporate physical principles (formulated via differential equations) into the machine learning algorithm. Physics-informed neural networks (PINN) is a great example of the wider class of A.I. models equipped for scientific learning. Many aspects of its salient features were first proposed in a seminal work by Lagaris
et al. in their paper ‘Artificial Neural Networks for Solving Ordinary and Partial Differential Equations’, while more recent formulations of it appeared in a series of seminal papers by Maziar Raissi et al. in 2017 and 2019, after which the terminology PINN became more established in the literature.
Moving beyond the black box nature of modeling datasets, this class of neural networks incorporate the physical principles underpinning the data by including a loss term representing how close the model’s predictions fit the differential equations that presumably enact the physical basis for the training dataset. If the differential equations involve unknown parameters, one can also use this class of neural networks to infer their optimal values that are compatible with the training datasets. This has offered a neural network-based methodology for scientific discovery in many fields. In the form of PINN, artificial intelligence-based models thus carry a more interpretable synthesis with known physical laws.
As a simple example, consider the scenario where we are modeling a dataset quantifying the temperature distribution of a system over some spatial domain for a duration of time. In physics, the heat equation models how heat diffuses throughout some specified region. It is a second-order partial differential equation (PDE) that reads
$$
\frac{\partial u}{\partial t} = \kappa
\nabla^2 u,
$$
where \( u (t, \vec{x} ) \) is the temperature distribution and \( \kappa \) is the thermal diffusivity parameter. If we have a training dataset representing observed, discrete points of this distribution, we can model the data using a PINN by equipping some conventional model (say a feedforward neural network with a few hidden layers) with the following loss term :
$$
\mathcal{L}_{heat} = \sum_{j=1}^{N_c} \bigg|
\left(
\frac{\partial u(\vec{x}_j, t_j)}{\partial t} – \kappa
\nabla^2 u(\vec{x}_j, t_j)
\right) \bigg|^2,
$$
where we evaluate the RHS on a portion of the total spatial and temporal domains. These points (\( N_c \) of them) are also known as collocation points. For example, if the context is just the temperature over a finite 1D segment of material, these points would be points along its length evaluated at a set of different times within the duration of the observed training dataset. Minimizing this loss term implies enhancing the compatibility of the dataset with the heat equation, and thus the physical principle (i.e. the heat equation) is now incorporated within the model. Such a loss term is also known as the `PDE residue’ since it measures the model’s discrepancy with the PDE.
Apart from this loss term, one can include the loss terms that try to enforce the boundary and initial conditions for the PDE. Multiplying each loss term by some weight parameter, and including the usual mean-squared-error for the model’s predictions, the overall loss function is of the form
$$\begin{aligned}
\mathcal{L}_{PINN} &= \lambda_F
\mathcal{L}_{pde} + \lambda_D
\mathcal{L}_{data} + \lambda_B
\mathcal{L}_{boundary} + \lambda_I
\mathcal{L}_{initial}, \cr
\mathcal{L}_{pde} &= \sum_{j=1}^{N_c} | \hat{\mathcal{F}} |^2 \equiv \sum_{j=1}^{N_c} \bigg|
\left(
\frac{\partial u(\vec{x}_j, t_j)}{\partial t} – \kappa
\nabla^2 u(\vec{x}_j, t_j)
\right) \bigg|^2, \cr
\mathcal{L}_{data} &= \sum_{k=1}^{N_D}
| \left(
u(\vec{x}_k, t_k ) – u_{obs}(\vec{x}_k, t_k )
\right) |^2, \cr
\mathcal{L}_{boundary} &= \sum_{r=1}^{N_T} \sum_{l=1}^{N_B}
| \left(
u(\vec{x}_l, t_r ) – u_{b.c.}(\vec{x}_l, t_r )
\right) |^2, \cr
\mathcal{L}_{initial} &= \sum_{m=1}^{N_I}
| \left(
u(\vec{x}_m, t_0 ) – u_{initial}(\vec{x}_m, t_0 )
\right) |^2.
\end{aligned}$$
In the above, \( \hat{\mathcal{F}} \) generally represents the partial differential equation with all its derivative operators and other terms, i.e. expressing the PDE in the problem as \( \hat{\mathcal{F}} = 0 \). (The heat equation was used as an explicit example.)
The data loss term \( \mathcal{L}_{data} \) is the typical mean-squared-error term in supervised learning that leads the model to minimize difference between training dataset and the model’s predictions. The loss term associated with the boundary conditions \( \mathcal{L}_{boundary} \) leads the model towards obeying the \( N_B \) boundary conditions for the duration sampled by \( N_T \) temporal points. Similarly, the loss term associated with the initial condition \( \mathcal{L}_{initial} \) leads the model towards obeying the initial condition at time \( t_0 \). Other physically motivated constraints can also be included in the loss function in the same manner.
If there are some undetermined parameters in the differential equations, they will only appear in \( \mathcal{L}_{pde} \) but they can still be determined by the model training that is guided by minimizing the overall loss \( \mathcal{L}_{PINN} \). For example, if we implement it in PyTorch, we just have to ensure that these unknown parameters are defined as model parameters.
Thus, in this simple fashion, PINN inherits the capability to infer unknown physical parameters through matching with the observed training dataset, yielding a clear example of interpretable machine learning. In principle, one can have work with a generic background neural network architecture that accepts spacetime coordinates as inputs and output neurons representing the dependent variables, although the vanilla feedforward multilayer perceptron structure is most commonly used.
There are quite a number of freely accessible codes that are very useful in exploring implementations of PINN in Pytorch framework, etc. Here are our favorites (hosted in Github):
Since its inception, PINN has been featured in numerous formal publications. We found the following review papers very useful in enabling one to have an overview of what have been understood and future directions, as well as reference links to many applications of PINN in different scientific and engineering domains.
End