学术前沿速递 |《Annals of Statistics》论文精选

 

本文精选了统计学国际顶刊《Annals of Statistics》近期发表的论文,提供统计学研究领域最新的学术动态。

 

Conditional calibration for false discovery rate control under dependence

原刊和作者:

Annals of Statistics Volume 50 Issue 6

William Fithian (University of California)

Lihua Lei (Stanford University)

Abstract

We introduce a new class of methods for finite-sample false discovery rate (FDR) control in multiple testing problems with dependent test statistics where the dependence is known. Our approach separately calibrates a data-dependent p-value rejection threshold for each hypothesis, relaxing or tightening the threshold as appropriate to target exact FDR control. In addition to our general framework, we propose a concrete algorithm, the dependence-adjusted Benjamini–Hochberg (dBH) procedure, which thresholds the BH-adjusted p-value for each hypothesis. Under positive regression dependence, the dBH procedure uniformly dominates the standard BH procedure, and in general it uniformly dominates the Benjamini–Yekutieli (BY) procedure (also known as BH with log correction), which makes a conservative adjustment for worst-case dependence. Simulations and real data examples show substantial power gains over the BY procedure, and competitive performance with knockoffs in settings where both methods are applicable. When the BH procedure empirically controls FDR (as it typically does in practice), the dBH procedure performs comparably.

Link: https://doi.org/10.1214/21-AOS2137

 

 

A no-free-lunch theorem for multitask learning

原刊和作者:

Annals of Statistics Volume 50 Issue 6

Steve Hanneke (Purdue University)

Samory Kpotufe (Columbia University)

Abstract

Multitask learning and related areas such as multisource domain adaptation address modern settings where data sets from N related distributions {Pt} are to be combined toward improving performance on any single such distribution D. A perplexing fact remains in the evolving theory on the subject: while we would hope for performance bounds that account for the contribution from multiple tasks, the vast majority of analyses result in bounds that improve at best in the number n of samples per task, but most often do not improve in N. As such, it might seem at first that the distributional settings or aggregation procedures considered in such analyses might be somehow unfavorable; however, as we show, the picture happens to be more nuanced, with interestingly hard regimes that might appear otherwise favorable.

In particular, we consider a seemingly favorable classification scenario where all tasks Pt share a common optimal classifier h∗, and which can be shown to admit a broad range of regimes with improved oracle rates in terms of N and n. Some of our main results are:

We show that, even though such regimes admit minimax rates accounting for both n and N, no adaptive algorithm exists, that is, without access to distributional information, no algorithm can guarantee rates that improve with large N for n fixed.

With a bit of additional information, namely, a ranking of tasks {Pt} according to their distance to a target D, a simple rank-based procedure can achieve near optimal aggregations of tasks’ data sets, despite a search space exponential in N. Interestingly, the optimal aggregation might exclude certain tasks, even though they all share the same h∗.

Link: https://doi.org/10.1214/22-AOS2189

 

 

Optimization hierarchy for fair statistical decision problems

原刊和作者:

Annals of Statistics Volume 50 Issue 6

Anil Aswani (University of California)

Matt Olfat (University of California)

Abstract

Data-driven decision making has drawn scrutiny from policy makers due to fears of potential discrimination, and a growing literature has begun to develop fair statistical techniques. However, these techniques are often specialized to one model context and based on ad hoc arguments, which makes it difficult to perform theoretical analysis. This paper develops an optimization hierarchy, which is a sequence of optimization problems with an increasing number of constraints, for fair statistical decision problems. Because our hierarchy is based on the framework of statistical decision problems, this means it provides a systematic approach for developing and studying fair versions of hypothesis testing, decision making, estimation, regression, and classification. We use the insight that qualitative definitions of fairness are equivalent to statistical independence between the output of a statistical technique and a random variable that measures attributes for which fairness is desired. We use this insight to construct an optimization hierarchy that lends itself to numerical computation, and we use tools from variational analysis and random set theory to prove that higher levels of this hierarchy lead to consistency in the sense that it asymptotically imposes this independence as a constraint in corresponding statistical decision problems. We demonstrate numerical effectiveness of our hierarchy using several data sets, and we use our hierarchy to fairly perform automated dosing of morphine.

Link: https://doi.org/10.1214/22-AOS2217

 

 

Half-trek criterion for identifiability of latent variable models

原刊和作者:

Annals of Statistics Volume 50 Issue 6

Rina Foygel Barber (University of Chicago)

Mathias Drton (Technical University of Munich)

Nils Sturma (Technical University of Munich)

Luca Weihs (Allen Institute for AI)

Abstract

We consider linear structural equation models with latent variables and develop a criterion to certify whether the direct causal effects between the observable variables are identifiable based on the observed covariance matrix. Linear structural equation models assume that both observed and latent variables solve a linear equation system featuring stochastic noise terms. Each model corresponds to a directed graph whose edges represent the direct effects that appear as coefficients in the equation system. Prior research has developed a variety of methods to decide identifiability of direct effects in a latent projection framework, in which the confounding effects of the latent variables are represented by correlation among noise terms. This approach is effective when the confounding is sparse and effects only small subsets of the observed variables. In contrast, the new latent-factor half-trek criterion (LF-HTC) we develop in this paper operates on the original unprojected latent variable model and is able to certify identifiability in settings, where some latent variables may also have dense effects on many or even all of the observables. Our LF-HTC is an effective sufficient criterion for rational identifiability, under which the direct effects can be uniquely recovered as rational functions of the joint covariance matrix of the observed random variables. When restricting the search steps in LF-HTC to consider subsets of latent variables of bounded size, the criterion can be verified in time that is polynomial in the size of the graph.

Link: https://doi.org/10.1214/22-AOS2221

 

 

On resampling schemes for particle filters with weakly informative observations

Annals of Statistics Volume 50 Issue 6

Nicolas Chopin (Institut Polytechnique de Paris)

Sumeetpal S. Singh (University of Cambridge)

Tomás Soto (LUT University)

Matti Vihola (University of Jyväskylä)

Abstract

We consider particle filters with weakly informative observations (or ‘potentials’) relative to the latent state dynamics. The particular focus of this work is on particle filters to approximate time-discretisations of continuous-time Feynman–Kac path integral models—a scenario that naturally arises when addressing filtering and smoothing problems in continuous time—but our findings are indicative about weakly informative settings beyond this context too. We study the performance of different resampling schemes, such as systematic resampling, SSP (Srinivasan sampling process) and stratified resampling, as the time-discretisation becomes finer and also identify their continuous-time limit, which is expressed as a suitably defined ‘infinitesimal generator.’ By contrasting these generators, we find that (certain modifications of) systematic and SSP resampling ‘dominate’ stratified and independent ‘killing’ resampling in terms of their limiting overall resampling rate. The reduced intensity of resampling manifests itself in lower variance in our numerical experiment. This efficiency result, through an ordering of the resampling rate, is new to the literature. The second major contribution of this work concerns the analysis of the limiting behaviour of the entire population of particles of the particle filter as the time discretisation becomes finer. We provide the first proof, under general conditions, that the particle approximation of the discretised continuous-time Feynman–Kac path integral models converges to a (uniformly weighted) continuous-time particle system.

Link: https://doi.org/10.1214/22-AOS2222

发布日期:2023-01-03浏览次数:
您是第位访问者
管理科学学报 ® 2024 版权所有
通讯地址:天津市南开区卫津路92号天津大学第25教学楼A座908室 邮编:300072
联系电话/传真:022-27403197 电子信箱:jmsc@tju.edu.cn