本文精选了统计学国际顶刊《Annals of Statistics》近期发表的论文,提供统计学领域最新的学术动态。
Rank tests for PCA under weak identifiability
原刊和作者:
Annals of Statistics, Volume 54, Issue 2
Davy Paindaveine (Université libre de Bruxelles)
Laura Peralvo Maroto (Université libre de Bruxelles)
Thomas Verdebout (Université libre de Bruxelles)
Abstract
In a triangular array framework where n observations are randomly sampled from a p-dimensional elliptical distribution with shape matrix Vn, we consider the problem of testing the null hypothesis H0: θ = θ0 against the alternative hypothesis H1: θ ≠ θ0, where θ is the (fixed) leading unit eigenvector of Vn and θ0 is a given unit p-vector. The dependence of the shape matrix on the sample size allows us to consider challenging asymptotic scenarios in which the parameter of interest θ is unidentified in the limit, because the ratio between both leading eigenvalues of Vn converges to one. We carefully study the corresponding limiting experiments under such weak identifiability, and we show that these may be LAN or non-LAN. While earlier work in the framework was strictly limited to Gaussian distributions, where the study of local log-likelihood ratios could simply rely on explicit expressions, our asymptotic investigation allows for essentially arbitrary elliptical distributions. This requires original results on quadratic mean differentiable families for triangular arrays of observations, which are likely to be of interest in other models, too. Even in non-LAN experiments, our results enable us to investigate, through Le Cam’s first and third lemmas, the asymptotic null and nonnull properties of multivariate rank tests. These nonparametric tests are shown to exhibit an excellent behavior under weak identifiability: not only do they maintain the target nominal size irrespective of the amount of weak identifiability, but they also keep their outstanding uniform efficiency properties under such nonstandard scenarios. In particular, Gaussian-score rank tests, under arbitrarily weak identifiability, still uniformly dominate their parametric pseudo-Gaussian competitor in terms of asymptotic relative efficiencies. Our theoretical results, which are the first ones to study rank tests in the triangular array framework allowing for weak identifiability, are supported by several Monte Carlo exercises.
Link: https://doi.org/10.1214/25-AOS2552
Distributionally robust learning for multisource unsupervised domain adaptation
原刊和作者:
Annals of Statistics, Volume 54, Issue 2
Zhenyu Wang (Rutgers University)
Peter Bühlmann (ETH Zürich)
Zijian Guo (Zhejiang University)
Abstract
Empirical risk minimization often performs poorly when the distribution of the target domain differs from those of the source domains. To address such potential distributional shifts, we develop an unsupervised domain adaptation approach that leverages labeled data from multiple source domains and unlabeled data from the target domain. We introduce a distributionally robust model that optimizes an adversarial reward based on explained variance across a class of target distributions, ensuring generalization to the target domain. We show that the proposed robust model is a weighted average of conditional outcome models from the source domains. This formulation allows us to compute the robust model through the aggregation of source models, which can be estimated using various machine learning algorithms of the user’s choice such as random forests, boosting and neural networks. Additionally, we introduce a bias-correction step to obtain a more accurate aggregation weight, which is effective for various machine learning algorithms. Our framework can be interpreted as a distributionally robust federated learning approach that satisfies privacy constraints while providing insights into the importance of each source for prediction on the target domain. The performance of our method is evaluated on both simulated and real data.
Link: https://doi.org/10.1214/25-AOS2578
Trace test for high-dimensional cointegration
原刊和作者:
Annals of Statistics, Volume 54, Issue 2
Alexei Onatski (University of Cambridge)
Chen Wang (University of Hong Kong)
Abstract
This paper studies Johansen’s (J. Econom. Dynam. Control 12 (1988) 231–254) trace test for cointegration in high-dimensional data. We show that when both cross-sectional and temporal dimension of the data go to infinity proportionally, the shifted and scaled modified trace statistic converges to a Gaussian random variable. We give explicit formulae for the shift and scale parameters as well as for the mean and variance of the Gaussian limit. Monte Carlo analysis shows excellent size properties of the asymptotic test, which is an improvement over the Bartlett-corrected versions of the original trace test, especially for relatively large ratios of the dimensionality to the sample size. The Monte Carlo also reveals a nonmonotonicity of the power of the test. We comment on the source of such a nonmonotonicity.
Link: https://doi.org/10.1214/25-AOS2579
Estimation of grouped time-varying network vector autoregressive models
原刊和作者:
Annals of Statistics, Volume 54, Issue 2
Degui Li (University of Macau)
Bin Peng (Monash University)
Songqiao Tang (Zhejiang University)
Weibiao Wu (University of Chicago)
Abstract
This paper introduces a flexible time-varying network vector autoregressive model framework for large-scale time series. A latent group structure is imposed on the heterogeneous and node-specific time-varying momentum and network spillover effects so that the number of unknown time-varying coefficients to be estimated can be reduced considerably. A classic agglomerative clustering algorithm with nonparametrically estimated distance matrix is combined with a ratio criterion to consistently estimate the latent group number and membership. A postgrouping local linear smoothing method is proposed to estimate the group-specific time-varying momentum and network effects, substantially improving the convergence rates of the preliminary estimates which ignore the latent structure. We further modify the methodology and theory to allow for structural breaks in either the group membership, group number or group-specific coefficient functions. Numerical studies including Monte-Carlo simulation and an empirical application are presented to examine the finite-sample performance of the developed model and methodology.
Link: https://doi.org/10.1214/25-AOS2580
Large-scale multiple testing: Fundamental limits of false discovery rate control and compound oracle
Annals of Statistics, Volume 54, Issue 2
Yutong Nie (Yale University)
Yihong Wu (Yale University)
Abstract
The false discovery rate (FDR) and the false nondiscovery rate (FNR), defined as the expected false discovery proportion (FDP) and the false nondiscovery proportion (FNP), are the most popular benchmarks for multiple testing. Despite the theoretical and algorithmic advances in recent years, the optimal trade-off between the FDR and the FNR has been largely unknown, except for certain restricted classes of decision rules, for example, separable rules, or for other performance metrics, for example, the marginal FDR and the marginal FNR (mFDR and mFNR). In this paper we determine the asymptotically optimal FDR-FNR trade-off under the two-group random mixture model when the number of hypotheses tends to infinity. Distinct from the optimal mFDR-mFNR trade-off, which is achieved by separable decision rules, the optimal FDR-FNR trade-off requires compound rules, even in the large-sample limit and for models as simple as the Gaussian location model. This suboptimality of separable rules also holds for other objectives, such as maximizing the expected number of true discoveries. Finally, to address the limitation of the FDR, which only controls the expectation but not the fluctuation of the FDP, we also determine the optimal tradeoff when the FDP is controlled with high probability and show it coincides with that of the mFDR and the mFNR. Extensions to models with a fixed nonnull proportion are also obtained.
Link: https://doi.org/10.1214/25-AOS2581