In this project, we develop randomized model reduction methods for advection-diffusion problems with sharply discontinuous source terms. To study such problems, we must solve the advection-diffusion equation, a partial differential equation (PDE) used to model systems such as a liquid dye being dissolved in a flowing fluid, or the combination of heat conduction and convection through a medium. This PDE arises often in the sciences, from fluid dynamics  to semiconductor physics , although in such contexts, the equation is almost always unsolvable by hand, so we rely on computer algorithms to efficiently obtain accurate approximations to the solutions. Our goal is to obtain a faster result than with direct numerical simulations like finite difference or finite element schemes. We employ randomized methods from data science to allow for parallel-in-time computation  and generation of a reduced order model. Compared to direct simulations, the reduced model produces a solution space of much lower dimension, meaning the computational complexity is greatly reduced. Thus, simulations with the reduced model allow for a further speedup in computational runtime, with the added benefit of maintaining accuracy. As a novel contribution, we consider the case of sharp discontinuities in source functions, partition the time domain into overlapping subintervals with overlap around the discontinuities, and construct a reduced basis on each subinterval. This allows for the construction of a reduced solution that combines information from the reduced bases, rather than relying on a single reduced basis to capture information throughout the time domain. We present a test case to show that this approach can provide significant improvements in accuracy compared to the construction of only one reduced basis over the whole-time domain.
One-sentence summary: To solve a partial differential equation modeling, e.g., the dissolving of a liquid dye in a flowing fluid or the conduction and convection of heat, we employ randomized methods from data science to facilitate parallel-in-time computation and generate a reduced order model to reduce computational complexity.
 Bird, R. B., Stewart, W. E. & Lightfoot, E. N. (2007). Transport Phenomena (Revised second). John Wiley & Sons.
 New, O. (2004) Derivation and numerical approximation of the quantum drift diffusion model for semiconductors, Jour. Myan. Acini. Arts & Sc., Vol. II (Part Two), No. 5.
 Schleuß, J., Smetana, K., & ter Maat, L. (2022). Randomized quasi-optimal local approximation spaces in time. arXiv preprint arXiv:2203.06276. To appear in SIAM J. Sci. Comput., 2023
Missing data imputation is a well-studied problem. Many of the methods solving this problem were developed and used in medical research. But the application of these methods is seen in many other areas, e.g., in property insurance underwriting.
Apart from very basic approaches of using the mean, mode, or other statistics of the non-missing observations for imputations, there are two other main categories of methods. The first category is the iterative approach. It is based on the idea of estimating the conditional distribution of one feature using all other available features. In one iteration, a conditional distribution estimator is trained to predict the value of each feature. This process is repeated for many iterations until it converges. This approach has been studied extensively and one of the most well-known methods is Multiple Imputation by Chained Equations (MICE).
Lately another approach using deep generative models was developed. In this approach, a generative model is trained to generate values in missing parts based on observed values. Review of the methods and specifically for tabular data (numeric and categorical) will be followed with examples of my industry experience with them.