Fri, 04 Aug 2023
1.A stochastic optimization approach to train non-linear neural networks with regularization of higher-order total variation
Abstract: While highly expressive parametric models including deep neural networks have an advantage to model complicated concepts, training such highly non-linear models is known to yield a high risk of notorious overfitting. To address this issue, this study considers a $k$th order total variation ($k$-TV) regularization, which is defined as the squared integral of the $k$th order derivative of the parametric models to be trained; penalizing the $k$-TV is expected to yield a smoother function, which is expected to avoid overfitting. While the $k$-TV terms applied to general parametric models are computationally intractable due to the integration, this study provides a stochastic optimization algorithm, that can efficiently train general models with the $k$-TV regularization without conducting explicit numerical integration. The proposed approach can be applied to the training of even deep neural networks whose structure is arbitrary, as it can be implemented by only a simple stochastic gradient descent algorithm and automatic differentiation. Our numerical experiments demonstrate that the neural networks trained with the $K$-TV terms are more ``resilient'' than those with the conventional parameter regularization. The proposed algorithm also can be extended to the physics-informed training of neural networks (PINNs).
2.Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models
Authors:Jungjun Choi, Ming Yuan
Abstract: This paper develops an inferential framework for matrix completion when missing is not at random and without the requirement of strong signals. Our development is based on the observation that if the number of missing entries is small enough compared to the panel size, then they can be estimated well even when missing is not at random. Taking advantage of this fact, we divide the missing entries into smaller groups and estimate each group via nuclear norm regularization. In addition, we show that with appropriate debiasing, our proposed estimate is asymptotically normal even for fairly weak signals. Our work is motivated by recent research on the Tick Size Pilot Program, an experiment conducted by the Security and Exchange Commission (SEC) to evaluate the impact of widening the tick size on the market quality of stocks from 2016 to 2018. While previous studies were based on traditional regression or difference-in-difference methods by assuming that the treatment effect is invariant with respect to time and unit, our analyses suggest significant heterogeneity across units and intriguing dynamics over time during the pilot program.