Methodology (stat.ME)
Mon, 17 Apr 2023
1.Sparse Positive-Definite Estimation for Large Covariance Matrices with Repeated Measurements
Authors:Sunpeng Duan, Guo Yu, Juntao Duan, Yuedong Wang
Abstract: In many fields of biomedical sciences, it is common that random variables are measured repeatedly across different subjects. In such a repeated measurement setting, dependence structures among random variables that are between subjects and within a subject may be different, and should be estimated differently. Ignoring this fact may lead to questionable or even erroneous scientific conclusions. In this paper, we study the problem of sparse and positive-definite estimation of between-subject and within-subject covariance matrices for high-dimensional repeated measurements. Our estimators are defined as solutions to convex optimization problems, which can be solved efficiently. We establish estimation error rate for our proposed estimators of the two target matrices, and demonstrate their favorable performance through theoretical analysis and comprehensive simulation studies. We further apply our methods to recover two covariance graphs of clinical variables from hemodialysis patients.
2.Visualizing hypothesis tests in survival analysis under anticipated delayed effects
Authors:José L. Jiménez, Isobel Barrott, Francesca Gasperoni, Dominic Magirr
Abstract: What can be considered an appropriate statistical method for the primary analysis of a randomized clinical trial (RCT) with a time-to-event endpoint when we anticipate non-proportional hazards owing to a delayed effect? This question has been the subject of much recent debate. The standard approach is a log-rank test and/or a Cox proportional hazards model. Alternative methods have been explored in the statistical literature, such as weighted log-rank tests and tests based on the Restricted Mean Survival Time (RMST). While weighted log-rank tests can achieve high power compared to the standard log-rank test, some choices of weights may lead to type-I error inflation under particular conditions. In addition, they are not linked to an unambiguous estimand. Arguably, therefore, they are difficult to intepret. Test statistics based on the RMST, on the other hand, allow one to investigate the average difference between two survival curves up to a pre-specified time point $\tau$ -- an unambiguous estimand. However, by emphasizing differences prior to $\tau$, such test statistics may not fully capture the benefit of a new treatment in terms of long-term survival. In this article, we introduce a graphical approach for direct comparison of weighted log-rank tests and tests based on the RMST. This new perspective allows a more informed choice of the analysis method, going beyond power and type I error comparison.
3.Predicting Malaria Incidence Using Artifical Neural Networks and Disaggregation Regression
Authors:Jack A. Hall, Tim C. D. Lucas
Abstract: Disaggregation modelling is a method of predicting disease risk at high resolution using aggregated response data. High resolution disease mapping is an important public health tool to aid the optimisation of resources, and is commonly used in assisting responses to diseases such as malaria. Current disaggregation regression methods are slow, inflexible, and do not easily allow non-linear terms. Neural networks may offer a solution to the limitations of current disaggregation methods. This project aimed to design a neural network which mimics the behaviour of disaggregation, then benchmark it against current methods for accuracy, flexibility and speed. Cross-validation and nested cross-validation tested neural networks against traditional disaggregation for accuracy and execution speed was measured. Neural networks did not improve on the accuracy of current disaggregation methods, although did see an improvement in execution time. The neural network models are more flexible and offer potential for further improvements on all metrics. The R package 'Kedis' (Keras-Disaggregation) is introduced as a user-friendly method of implementing neural network disaggregation models.