Methodology (stat.ME)
Thu, 31 Aug 2023
1.Haplotype frequency inference from pooled genetic data with a latent multinomial model
Authors:Yong See Foo, Jennifer A. Flegg
Abstract: In genetic studies, haplotype data provide more refined information than data about separate genetic markers. However, large-scale studies that genotype hundreds to thousands of individuals may only provide results of pooled data, where only the total allele counts of each marker in each pool are reported. Methods for inferring haplotype frequencies from pooled genetic data that scale well with pool size rely on a normal approximation, which we observe to produce unreliable inference when applied to real data. We illustrate cases where the approximation breaks down, due to the normal covariance matrix being near-singular. As an alternative to approximate methods, in this paper we propose exact methods to infer haplotype frequencies from pooled genetic data based on a latent multinomial model, where the observed allele counts are considered integer combinations of latent, unobserved haplotype counts. One of our methods, latent count sampling via Markov bases, achieves approximately linear runtime with respect to pool size. Our exact methods produce more accurate inference over existing approximate methods for synthetic data and for data based on haplotype information from the 1000 Genomes Project. We also demonstrate how our methods can be applied to time-series of pooled genetic data, as a proof of concept of how our methods are relevant to more complex hierarchical settings, such as spatiotemporal models.
2.Income, education, and other poverty-related variables: a journey through Bayesian hierarchical models
Authors:Irving Gómez-Méndez, Chainarong Amornbunchornvej
Abstract: One-shirt-size policy cannot handle poverty issues well since each area has its unique challenges, while having a custom-made policy for each area separately is unrealistic due to limitation of resources as well as having issues of ignoring dependencies of characteristics between different areas. In this work, we propose to use Bayesian hierarchical models which can potentially explain the data regarding income and other poverty-related variables in the multi-resolution governing structural data of Thailand. We discuss the journey of how we design each model from simple to more complex ones, estimate their performance in terms of variable explanation and complexity, discuss models' drawbacks, as well as propose the solutions to fix issues in the lens of Bayesian hierarchical models in order to get insight from data. We found that Bayesian hierarchical models performed better than both complete pooling (single policy) and no pooling models (custom-made policy). Additionally, by adding the year-of-education variable, the hierarchical model enriches its performance of variable explanation. We found that having a higher education level increases significantly the households' income for all the regions in Thailand. The impact of the region in the households' income is almost vanished when education level or years of education are considered. Therefore, education might have a mediation role between regions and the income. Our work can serve as a guideline for other countries that require the Bayesian hierarchical approach to model their variables and get insight from data.
3.A General Equivalence Theorem for Crossover Designs under Generalized Linear Models
Authors:Jeevan Jankar Department of Statistics, University of Georgia, Athens, 30602, GA, Jie Yang Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, 60607, IL, Abhyuday Mandal Department of Statistics, University of Georgia, Athens, 30602, GA
Abstract: With the help of Generalized Estimating Equations, we identify locally D-optimal crossover designs for generalized linear models. We adopt the variance of parameters of interest as the objective function, which is minimized using constrained optimization to obtain optimal crossover designs. In this case, the traditional general equivalence theorem could not be used directly to check the optimality of obtained designs. In this manuscript, we derive a corresponding general equivalence theorem for crossover designs under generalized linear models.