Exploring the latent clustering structure of objects is a primary and fundamental task in many fields for massive data. For example, the aim of precision medicine is transforming traditional population-average treatment effects into individualized treatment effects, which usually need to identify subgroups from a heterogeneous population first. Massive data is frequently dispersed across multiple sites in the form of local generation, local collection, and local storage, which brings new challenges to subgroup analysis due to the issue of computing burden and communication costs. In this article, we study the subgroup analysis for distributed environment scenarios to identify subgroups of individuals from multiple different sites. To achieve efficient communication and privacy-protected grouping and estimation, we develop a distributed surrogate fusion penalized regression (DSFPR) approach, which consists of two stages. In the first stage, we construct a preliminary grouping structure through local subgroup analysis on each site. In the second stage, we propose the surrogate objective function based on the grouping structure obtained in the first stage, and perform global subgroup analysis. To address parallel problem-solving, we design a distributed alternating direction method of multiplier algorithm, which does not involve the transmission of personal information. We introduce the sub-oracle property for estimation in local subgroup analysis and establish theoretical properties for the final estimation under both correct and incorrect preliminary grouping structures. Finally, simulations and real data analysis validate the effectiveness of our approach.
Speaker Biography: Zhu Wensheng, Professor and Doctoral Supervisor at the School of Mathematics and Statistics, Yunnan University. I graduated with a PhD from Northeast Normal University in 2006, conducted postdoctoral research at Yale University from 2008 to 2010, and visited the University of North Carolina at Chapel Hill from 2015 to 2017. My research focuses on biostatistics and statistical machine learning. I have published multiple academic papers in journals such as JASA, Biometrika, Statistica Sinica, Science China Themes, The Lancet, and have led multiple national level research projects. Chairman of the Bayesian Statistics Branch of the China On site Statistical Research Association, Vice President of the China Statistical Education Society, and Vice President of the National Industrial Statistics Teaching and Research Association. National first-class undergraduate course leader.