skip to content
 

Timetable (STSW04)

Future challenges in statistical scalability

Monday 25th June 2018 to Friday 29th June 2018

Monday 25th June 2018
09:45 to 10:20 Registration
10:20 to 10:50 Morning Coffee
10:50 to 11:00 Welcome from Christie Marr (INI Deputy Director)
11:00 to 11:45 Iain Johnstone (Stanford University)
tba
INI 1
11:45 to 12:30 Philippe Rigollet (Massachusetts Institute of Technology)
tba
INI 1
12:30 to 14:00 Buffet Lunch at INI
14:00 to 14:45 Matthew Stephens (University of Chicago)
On applications of Empirical Bayes approaches to the Normal Means problem
The normal means problem is very simple: given normally-distributed observations with known variances and unknown means, estimate the means. That is, given X_j \sim N(\theta_j, \sigma_j^2, estimate \theta_j. A key idea is that one can do better than the maximum likelihood estimates, \hat{\theta}_j= \X_j, in particular by use of appropriate "shrinkage" estimators. One attractive way to perform shrinkage estimation in practice is to use Empirical Bayes methods. That is, to assume that \theta_j are independent and identically distributed from some distribution g that is to be estimated from the data. Then, given such an estimate \hat{g}, the posterior distributions of \theta_j can be computed to perform inference. We call this the "Empirical Bayes Normal Means" (EBNM) problem.

Despite its simplicity, solving the EBNM problem has a wide range of practical applications. Here we present some flexible non-parametric approaches we have recently developed for solving the EBNM problem, and describe their application to several different settings: false discovery rate (FDR) estimation, non-parametric smoothing, and sparse matrix factorization problems (ie sparse factor analysis and sparse principal components analysis).

INI 1
14:45 to 15:30 Jana Jankova (University of Cambridge)
tba
INI 1
15:30 to 16:00 Afternoon Tea
16:00 to 16:45 Flori Bunea (Cornell University)
A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics
Topic models have become popular for the analysis of data that consists in a collection of n independent multinomial observations. The model links all cell probabilities, collected in a p x n matrix, via the assumption that this matrix can be factorized as the product of two non-negative matrices A and W. Topic models have been originally developed in text mining, when one browses through n documents, based on a dictionary of p words, and covering K topics. In this terminology, the matrix A is called the word-topic matrix, and is the main target of estimation. It can be viewed as a matrix of conditional probabilities, and it is uniquely defined, under appropriate separability assumptions, discussed in this talk. Notably, the unique A is required to satisfy what is commonly known as the anchor word assumption, under which A has an unknown number of rows respectively proportional to K-dimensional canonical basis vectors. The indices of such rows are referred to as anchor words. Recent computationally feasible algorithms, with theoretical guarantees, utilize constructively this assumption by linking the estimation of the set of anchor words with that of estimating the K vertices of a simplex. This crucial step in the estimation of A requires K to be known, and cannot be easily extended to the more realistic set-up when K is unknown. In this talk, we take a different view on anchor word estimation, and on the estimation of the word-topic matrix A. We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any n, p, K and document length N, and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We illustrate, via simulations, that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.
INI 1
17:00 to 18:00 Welcome Wine Reception & Posters at INI
Tuesday 26th June 2018
09:00 to 09:45 Guy Bresler (Massachusetts Institute of Technology)
Reducibility and Computational Lower Bounds for Problems with Planted Sparse Structure
The prototypical high-dimensional statistics problem entails finding a structured signal in noise. Many of these problems exhibit a curious phenomenon: the amount of data needed by all known computationally efficient algorithms far exceeds what is needed for inefficient algorithms that search over all possible structures. A line of work initiated by Berthet and Rigollet in 2013 has aimed to explain these gaps by reducing from conjecturally hard problems in computer science. However, the delicate nature of average-case reductions has limited the applicability of this approach. In this work we introduce several new techniques to give a web of average-case reductions showing strong computational lower bounds based on the planted clique conjecture. These include tight lower bounds for Planted Independent Set, Planted Dense Subgraph, Sparse Spiked Wigner, Sparse PCA, as well as for new models that we introduce. Joint work with Matthew Brennan and Wasim Huleihel.
INI 1
09:45 to 10:30 Chao Gao (University of Chicago)
tba
INI 1
10:30 to 11:00 Morning Coffee
11:00 to 11:45 Ryan Tibshirani (Carnegie Mellon University)
tba
INI 1
11:45 to 12:30 Peter Bickel (University of California, Berkeley)
Two network scale challenges:Constructing and fitting hierarchical block models and fitting large block models using the mean field method
Work with S.Bhattacharyya,T.Li,E.Levina,S.Mukherjee,P.Sarkar Networks are a complex type of structure presenting itself in many applications . They are usually represented by a graph ,with possibly weighted edges plus additional covariates (such as directions).Block models have been studied for some time as basic approximations to ergodic stationary probability models for single graphs.A huge number of fitting methods have been developed for these models some of which we will touch on. The mean field method in which an increasing number of parameters must be fitted is used not only for multiple membership block models but also in applications such as LDA.if the graph is too large poor behaviour of the method can be seen.. We have developed what we call "patch methods " for fitting which help both computationally and inferentially in such situations bur much further analysis is needed. It is intuitively clear but mathematically unclear how knowledge of the model having nested scales helps in fitting large scale as opposed to small scale parameters.We will discuss this issue through an example,
INI 1
12:30 to 14:00 Buffet Lunch at INI
14:00 to 14:45 Marten Herman Wegkamp (Cornell University)
Adaptive estimation of the rank of the regression coefficient matrix
We consider the multivariate response regression problem with a regression coefficient matrix of low, unknown rank. In this setting, we analyze a new criterion for selecting the optimal reduced rank. This criterion does not require estimation of the unknown variance of the noise, nor depends on a delicate choice of a tuning parameter. We develop an iterative, fully data-driven procedure, that adapts to the optimal signal-to-noise ratio. This procedure finds the true rank in a few steps with overwhelming probability. At each step, our estimate increases, while at the same time it does not exceed the true rank. Our finite sample results hold for any sample size and any dimension, even when the number of responses and of covariates grow much faster than the number of observations. An extensive simulation study that confirms our theoretical findings in both low and high dimensional settings. This is joint work with Xin Bing.
INI 1
14:45 to 15:30 Maryam Fazel (University of Washington)
tba
INI 1
15:30 to 16:00 Afternoon Tea
16:00 to 16:45 Alex d'Aspremont (CNRS - Ecole Normale Superieure Paris)
An Approximate Shapley-Folkman Theorem.
with Thomas Kerdreux and Igor Colin. The Shapley-Folkman theorem shows that Minkowski averages of uniformly bounded sets tend to be convex when the number of terms in the sum becomes much larger than the ambient dimension. In optimization, \citet{Aubi76} show that this produces an a priori bound on the duality gap of separable nonconvex optimization problems involving finite sums. This bound is highly conservative and depends on unstable quantities, and we relax it in several directions to show that non convexity can have a much milder impact on finite sum minimization problems such as empirical risk minimization and multi-task classification. As a byproduct, we show a new version of Maurey's classical approximate Carath\'eodory lemma where we sample a significant fraction of the coefficients, without replacement.
INI 1
Wednesday 27th June 2018
09:00 to 09:45 Urvashi Oswal (University of Wisconsin-Madison)
tba
INI 1
09:45 to 10:30 Elizaveta Levina (University of Michigan)
Matrix completion in network analysis
Matrix completion is an active area of research in itself, and a natural tool to apply to network data, since many real networks are observed incompletely and/or with noise. However, developing matrix completion algorithms for networks requires taking into account the network structure. This talk will discuss three examples of matrix completion used for network tasks. First, we discuss the use of matrix completion for cross-validation on network data, a long-standing problem in network analysis. Two other examples focus on reconstructing incompletely observed networks, with structured missingness resulting from network sampling mechanisms. One scenario we consider is egocentric sampling, where a set of nodes is selected first and then their connections to the entire network are observed. Another scenario focuses on data from surveys, where people are asked to name a given number of friends. We show that matrix completion, when used appropriately, can generally be very helpful in solving network problems.
INI 1
10:30 to 11:00 Morning Coffee
11:00 to 11:45 Garvesh Raskutti (University of Wisconsin-Madison)
tba
INI 1
11:45 to 12:30 Peter Bartlett (University of California, Berkeley)
Representation, optimization and generalization properties of deep neural networks
Deep neural networks have improved the state-of-the-art performance for prediction problems across an impressive range of application areas. This talk describes some recent results in three directions. First, we investigate the impact of depth on representational properties of deep residual networks, which compute near-identity maps at each layer, showing how their representational power improves with depth and that the functional optimization landscape has the desirable property that stationary points are optimal. Second, we study the implications for optimization in deep linear networks, showing how the success of a family of gradient descent algorithms that regularize towards the identity function depends on a positivity condition of the regression function. Third, we consider how the performance of deep networks on training data compares to their predictive accuracy, we demonstrate deviation bounds that scale with a certain "spectral complexity," and we compare the behavior of these bounds with the observed performance of these networks in practical problems.

Joint work with Steve Evans, Dylan Foster, Dave Helmbold, Phil Long, and Matus Telgarsky.

INI 1
12:30 to 14:00 Buffet Lunch at INI
14:00 to 17:00 Free Afternoon
Thursday 28th June 2018
09:00 to 09:45 Tong Zhang (Rutgers, The State University of New Jersey)
tba
INI 1
09:45 to 10:30 Aurore Delaigle (University of Melbourne)
tba
INI 1
10:30 to 11:00 Morning Coffee
11:00 to 11:45 Francis Bach (INRIA Paris - Rocquencourt); (ENS - Paris)
Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes
We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data. While several passes have been widely reported to perform practically better in terms of predictive performance on unseen data, the existing theoretical analysis of SGD suggests that a single pass is statistically optimal. While this is true for low-dimensional easy problems, we show that for hard problems, multiple passes lead to statistically optimal predictions while single pass does not; we also show that in these hard models, the optimal number of passes over the data increases with sample size. In order to define the notion of hardness and show that our predictive performances are optimal, we consider potentially infinite-dimensional models and notions typically associated to kernel methods, namely, the decay of eigenvalues of the covariance matrix of the features and the complexity of the optimal predictor as measured through the covariance matrix. We illustrate our results on synthetic experiments with non-linear kernel methods and on a classical benchmark with a linear model. (Joint work with Loucas Pillaud-Vivien and Alessandro Rudi)




INI 1
11:45 to 12:30 Edward Ionides (University of Michigan)
Monte Carlo adjusted profile likelihood, with applications to spatiotemporal and phylodynamic inference.
Partially observed nonlinear stochastic dynamic systems raise inference challenges. Sequential Monte Carlo (SMC) methods provide a route to accessing the likelihood function. However, despite the advantage of applicability to a wide class of nonlinear models, standard SMC methods have a limitation that they scale poorly to large systems. We present a profile likelihood approach, properly adjusted for Monte Carlo uncertainty, that enables likelihood-based inference in systems for which Monte Carlo error remains large despite stretching the limits of available computational resources. Together with state-of-the-art SMC algorithms, this technique permits effective inference on some scientific problems in panel time series analysis, spatiotemporal modeling, and inferring population dynamic models from genetic sequence data. The results presented are joint work with Carles Breto, Joonha Park, Alex Smith and Aaron King.
INI 1
12:30 to 14:00 Buffet Lunch at INI
14:00 to 14:45 Jinchi Lv (University of Southern California)
Asymptotics of Eigenvectors and Eigenvalues for Large Structured Random Matrices
Characterizing the exact asymptotic distributions of high-dimensional eigenvectors for large structured random matrices poses important challenges yet can provide useful insights into a range of applications. This paper establishes the asymptotic properties of the spiked eigenvectors and eigenvalues for the generalized Wigner random matrix, where the mean matrix is assumed to have a low-rank structure. Under some mild regularity conditions, we provide the asymptotic expansions for the spiked eigenvalues and show that they are asymptotically normal after some normalization. For the spiked eigenvectors, we provide novel asymptotic expansions for the general linear combination and further show that the linear combination is asymptotically normal after some normalization, where the weight vector can be an arbitrary unit vector. Simulation studies verify the validity of our new theoretical results. Our family of models encompasses many popularly used ones such as the stochastic block models with or without overlapping communities for network analysis and the topic models for text analysis, and our general theory can be exploited for statistical inference in these large-scale applications. This is a joint work with Jianqing Fan, Yingying Fan and Xiao Han.
INI 1
14:45 to 15:30 Rui Castro (Technische Universiteit Eindhoven)
Are there needles in a (moving) haystack? Adaptive sensing for detection and estimation of static and dynamically evolving signals
In this work we investigate the problem of testing for the presence of dynamically evolving sparse signals. This is motivated by settings were the signal of interest changes rapidly while measurements are being collected (e.g., monitoring of the electromagnetic spectrum, or detection of hot spots of a rapidly spreading disease). We model the signal as an n dimensional vector that has s non-zero components, and at each time a fraction of the non-zero components may change their location. The statistical problem is to decide whether the signal is a zero vector or in fact it has non-zero components. This decision is based on m noisy observations of individual signal components. We consider two different sensing paradigms, namely adaptive and non-adaptive sensing. For non-adaptive sensing the choice of components to measure has to be decided before the data collection process started, while for adaptive sensing one can adjust the sensing process based on observations collected earlier. We characterize the difficulty of this detection problem in both sensing paradigms, with special interest to the speed of change of the active components. In addition we provide an adaptive sensing algorithm for this problem and contrast its performance to that of best non-adaptive detection algorithms. Interestingly, when the number of measurements is small (on the order n/s), there is a significant difference in performance between the two sensing paradigms, as non-adaptive sensing is unable to exploit the dynamic nature of the signal (based on joint works with Ervin Tánczos).
INI 1
15:30 to 16:00 Afternoon Tea
16:00 to 16:45 Pradeep Ravikumar (Carnegie Mellon University)
Robust Estimation via Robust Gradient Estimation
A common assumption in the training of machine learning systems is that the data is sufficiently clean and well-behaved: there are very few or no outliers, or that the distribution of the data does not have very long tails. As machine learning finds wider usage, these assumptions are increasingly indefensible. The key question then is how to perform estimation that is robust to departure from these assumptions; and which has been of classical interest, with seminal contributions due to Box, Tukey, Huber, Hampel, and several others. Loosely, there seemed to be a computation-robustness tradeoff, practical estimators did have strong robustness guarantees, while estimators with strong robustness guarantees were computationally impractical. In our work, we provide a new class of computationally-efficient class of estimators for risk minimization that are provably robust to a variety of robustness settings, such as arbitrary oblivious contamination, and heavy-tailed data, among others. Our workhorse is a novel robust variant of gradient descent, and we provide conditions under which our gradient descent variant provides accurate and robust estimators in any general convex risk minimization problem. These results provide some of the first computationally tractable and provably robust estimators for general statistical models. Joint work with Adarsh Prasad, Arun Sai Suggala, Sivaraman Balakrishnan.
INI 1
19:30 to 22:00 Formal Dinner at St John's College
Friday 29th June 2018
09:00 to 09:45 Alexandre Tsybakov (CREST: Centre de Recherche en Économie et Statistique)
tba
INI 1
09:45 to 10:30 Vincent Vu (Ohio State University)
Group invariance and computational sufficiency
INI 1
10:30 to 11:00 Morning Coffee
11:00 to 11:45 Richard Samworth (University of Cambridge)
Data perturbation for data science
When faced with a dataset and a problem of interest, should we propose a statistical model and use that to inform an appropriate algorithm, or dream up a potential algorithm and then seek to justify it? The former is the more traditional statistical approach, but the latter appears to be becoming more popular. I will discuss a class of algorithms that belong in the second category, namely those that involve data perturbation (e.g. subsampling, random projections, artificial noise, knockoffs,...). As examples, I will consider Complementary Pairs Stability Selection for variable selection and sparse PCA via random projections. This will involve joint work with Rajen Shah, Milana Gataric and Tengyao Wang.
INI 1
11:45 to 12:30 Peter Bühlmann (ETH Zürich); Domagoj Ćevid (ETH Zürich)
Deconfounding using Spectral Transformations
High-dimensional regression methods which rely on the sparsity of the ground truth, such as the Lasso, might break down in the presence of confounding variables. If a latent variable affects both the response and the predictors, the correlation between them changes. This phenomenon can be represented as a linear model where the sparse coefficient vector has been perturbed. We will present our work on this problem. We investigate and propose some spectral transformations for the data which serve as input for the Lasso. We discuss assumptions for achieving the optimal error rate and illustrate the performance on a genomic dataset. The approach is easy to use and leads to convincing results. The talk is based on joint work with Nicolai Meinshausen.
INI 1
12:30 to 14:00 Buffet Lunch at INI
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons