Future challenges in statistical scalability
Monday 25th June 2018 to Friday 29th June 2018
09:45 to 10:20  Registration  
10:20 to 10:50  Morning Coffee  
10:50 to 11:00  Welcome from Christie Marr (INI Deputy Director)  
11:00 to 11:45 
Iain Johnstone (Stanford University) Eigenstructure in high dimensional random effects models
The eigenstructure of i.i.d. samples from high dimensional data is
known to show phenomena unexpected from experience with low dimensions
 eigenvalue spreading, bias and eigenvector inconsistency.
Motivated by some problems in quantitative genetics, we now consider
high dimensional data with more than one level of variation, and more
specifically the spectral properties of estimators of the various
components of variance matrices. We briefly describe bulk and edge
eigenvalue distributions in this setting, and then focus more
attention on properties and estimation of 'spiked' models. This is
joint work with Zhou Fan and Yi Sun.

INI 1  
11:45 to 12:30 
Philippe Rigollet (Massachusetts Institute of Technology) Uncoupled isotonic regression via minimum Wasserstein deconvolution
Isotonic regression is a standard problem in shape constrained estimation where the goal is to estimate an unknown nondecreasing regression function $f$ from independent pairs $(x_i,y_i)$ where $\E[y_i]=f(x_i), i=1, \ldots n$. While this problem is well understood both statistically and computationally, much less is known about its uncoupled counterpart where one is given uncoupled $\{x_1, \ldots, x_n\}$ and $\{y_1, \ldots, y_n\}$. In this work, we leverage tools from optimal transport theory to derive minimax rates under weak moments conditions on $y_i$ together with an efficient algorithm. Both upper and lower bounds are articulated around momentmatching arguments that are also pertinent to learning mixtures of distributions and deconvolution. [Joint work with Jonathan Weed (MIT)]

INI 1  
12:30 to 14:00  Buffet Lunch at INI  
14:00 to 14:45 
Matthew Stephens (University of Chicago) On applications of Empirical Bayes approaches to the Normal Means problem
The normal means problem is very simple: given normallydistributed observations with known variances and unknown means, estimate the means. That is, given X_j \sim N(\theta_j, \sigma_j^2, estimate \theta_j. A key idea is that one can do better than the maximum likelihood estimates, \hat{\theta}_j= \X_j, in particular by use of appropriate "shrinkage" estimators. One attractive way to perform shrinkage estimation in practice is to use Empirical Bayes methods. That is, to assume that \theta_j are independent and identically distributed from some distribution g that is to be estimated from the data. Then, given such an estimate \hat{g}, the posterior distributions of \theta_j can be computed to perform inference. We call this the "Empirical Bayes Normal Means" (EBNM) problem. Despite its simplicity, solving the EBNM problem has a wide range of practical applications. Here we present some flexible nonparametric approaches we have recently developed for solving the EBNM problem, and describe their application to several different settings: false discovery rate (FDR) estimation, nonparametric smoothing, and sparse matrix factorization problems (ie sparse factor analysis and sparse principal components analysis). 
INI 1  
14:45 to 15:30 
Jana Jankova (University of Cambridge) Asymptotic Inference for Eigenstructure of Large Covariance Matrices
A vast number of methods have been proposed in literature for point estimation of eigenstructure of covariance matrices in highdimensional settings. In this work, we study uncertainty quantification and propose methodology for inference and hypothesis testing for individual loadings of the covariance matrix. We base our methodology on a Lassopenalized Mestimator which, despite nonconvexity, may be solved by a polynomialtime algorithm such as coordinate or gradient descent. Our results provide theoretical guarantees on asymptotic normality of the new estimators and may be used for valid hypothesis testing and variable selection. These results are achieved under a sparsity condition relating the number of nonzero loadings, sample size, dimensionality of the covariance matrix and spectrum of the covariance matrix. This talk is based on joint work with Sara van de Geer.

INI 1  
15:30 to 16:00  Afternoon Tea  
16:00 to 16:45 
Flori Bunea (Cornell University) A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics
Topic models have become popular for the analysis of data that consists in a collection of n independent multinomial observations. The model links all cell probabilities, collected in a p x n matrix, via the assumption that this matrix can be factorized as the product of two nonnegative matrices A and W. Topic models have been originally developed in text mining, when one browses through n documents, based on a dictionary of p words, and covering K topics. In this terminology, the matrix A is called the wordtopic matrix, and is the main target of estimation. It can be viewed as a matrix of conditional probabilities, and it is uniquely defined, under appropriate separability assumptions, discussed in this talk. Notably, the unique A is required to satisfy what is commonly known as the anchor word assumption, under which A has an unknown number of rows respectively proportional to Kdimensional canonical basis vectors. The indices of such rows are referred to as anchor words. Recent computationally feasible algorithms, with theoretical guarantees, utilize constructively this assumption by linking the estimation of the set of anchor words with that of estimating the K vertices of a simplex. This crucial step in the estimation of A requires K to be known, and cannot be easily extended to the more realistic setup when K is unknown.
In this talk, we take a different view on anchor word estimation, and on the estimation of the wordtopic matrix A. We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any n, p, K and document length N, and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We illustrate, via simulations, that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.

INI 1  
17:00 to 18:00  Welcome Wine Reception & Posters at INI 
09:00 to 09:45 
Guy Bresler (Massachusetts Institute of Technology) Reducibility and Computational Lower Bounds for Problems with Planted Sparse Structure
The prototypical highdimensional statistics problem entails finding a structured signal in noise. Many of these problems exhibit a curious phenomenon: the amount of data needed by all known computationally efficient algorithms far exceeds what is needed for inefficient algorithms that search over all possible structures. A line of work initiated by Berthet and Rigollet in 2013 has aimed to explain these gaps by reducing from conjecturally hard problems in computer science. However, the delicate nature of averagecase reductions has limited the applicability of this approach. In this work we introduce several new techniques to give a web of averagecase reductions showing strong computational lower bounds based on the planted clique conjecture. These include tight lower bounds for Planted Independent Set, Planted Dense Subgraph, Sparse Spiked Wigner, Sparse PCA, as well as for new models that we introduce. Joint work with Matthew Brennan and Wasim Huleihel.

INI 1  
09:45 to 10:30 
Chao Gao (University of Chicago) Reduced Isotonic Regression
Consider an $n$dimensional vector $X$ with mean $\theta^*$. In this talk, we consider $\theta^*$ that is both nondecreasing and has a piecewise constant structure. We establish the exact minimax rate of estimating such monotone functions, and thus give a nontrivial answer to an open problem in the shapeconstrained analysis literature. The minimax rate involves an interesting iterated logarithmic dependence on the dimension. We then develop a penalized leastsquares procedure for estimating $\theta^*$ adaptively. This estimator is shown to achieve the derived minimax rate without the knowledge of the number of pieces in linear time. We further allow the model to be misspecified and derive oracle inequalities with the optimal rates for the proposed estimator. This is a joint work with Fang Han and Cunhui Zhang.

INI 1  
10:30 to 11:00  Morning Coffee  
11:00 to 11:45 
Ryan Tibshirani (Carnegie Mellon University) Dykstra’s Algorithm, ADMM, and Coordinate Descent: Connections, Insights, and Extensions
We study connections between Dykstra’s algorithm for projecting onto an intersection of convex sets, the augmented Lagrangian method of multipliers or ADMM, and block coordinate descent. We prove that coordinate descent for a regularized regression problem, in which the penalty is a separable sum of support functions, is exactly equivalent to Dykstra’s algorithm applied to the dual problem. ADMM on the dual problem is also seen to be equivalent, in the special case of two sets, with one being a linear subspace. These connections, aside from being interesting in their own right, suggest new ways of analyzing and extending coordinate descent. For example, from existing convergence theory on Dykstra’s algorithm over polyhedra, we discern that coordinate descent for the lasso problem converges at an (asymptotically) linear rate. We also develop two parallel versions of coordinate descent, based on the Dykstra and ADMM connections. Finally, we discuss the implications of this work for backfitting in additive models.

INI 1  
11:45 to 12:30 
Peter Bickel (University of California, Berkeley) Two network scale challenges:Constructing and fitting hierarchical block models and fitting large block models using the mean field method
Work with S.Bhattacharyya,T.Li,E.Levina,S.Mukherjee,P.Sarkar
Networks are a complex type of structure presenting itself in many applications . They are usually represented by a graph ,with possibly weighted edges plus additional covariates (such as directions).Block models have been studied for some time as basic approximations to ergodic stationary probability models for single graphs.A huge number of fitting methods have been developed for these models some of which we will touch on.
The mean field method in which an increasing number of parameters must be fitted is used not only for multiple membership block models but also in applications such as LDA.if the graph is too large poor behaviour of the method can be seen.. We have developed what we call "patch methods " for fitting which help both computationally and inferentially in such situations bur much further analysis is needed.
It is intuitively clear but mathematically unclear how knowledge of the model having nested scales helps in fitting large scale as opposed to small scale parameters.We will discuss this issue through an example,

INI 1  
12:30 to 14:00  Buffet Lunch at INI  
14:00 to 14:45 
Marten Herman Wegkamp (Cornell University) Adaptive estimation of the rank of the regression coefficient matrix
We consider the multivariate response regression problem with a regression coefficient matrix of low, unknown rank. In this setting, we analyze a new criterion for selecting the optimal reduced rank. This criterion does not require estimation of the unknown variance of the noise, nor depends on a delicate choice of a tuning parameter. We develop an iterative, fully datadriven procedure, that adapts to the optimal signaltonoise ratio. This procedure finds the true rank in a few steps with overwhelming probability. At each step, our estimate increases, while at the same time it does not exceed the true rank. Our finite sample results hold for any sample size and any dimension, even when the number of responses and of covariates grow much faster than the number of observations. An extensive simulation study that confirms our theoretical findings in both low and high dimensional settings.
This is joint work with Xin Bing.

INI 1  
14:45 to 15:30 
Maryam Fazel (University of Washington) Competitive Online Algorithms for Budgeted Allocation with Application to Online Experiment Design
We consider an online resource allocation problem, where the goal is to maximize a function of a positive semidefinite (PSD) matrix with a scalar budget constraint. The problem data arrives online, and the algorithm needs to make an irrevocable decision at each step. Of particular interest are classic experiment design problems in the online setting, with the algorithm deciding whether to allocate budget to each experiment as new experiments become available sequentially.
We analyze two classes of primaldual algorithms and provide bounds on their competitive ratios. Our analysis relies on a smooth surrogate of the objective function that needs to satisfy a new diminishingreturns property. We then formulate a convex optimization problem to directly optimize this competitive ratio bound; this design problem can be solved offline before the data start arriving. We give examples of computing the smooth surrogate for Doptimal and Aoptimal experiment design.
This is joint work with Reza Eghbali and James Saunderson.

INI 1  
15:30 to 16:00  Afternoon Tea  
16:00 to 16:45 
Alex d'Aspremont (CNRS  Ecole Normale Superieure Paris) An Approximate ShapleyFolkman Theorem.
with Thomas Kerdreux and Igor Colin.
The ShapleyFolkman theorem shows that Minkowski averages of uniformly bounded sets tend to be convex when the number of terms in the sum becomes much larger than the ambient dimension. In optimization, \citet{Aubi76} show that this produces an a priori bound on the duality gap of separable nonconvex optimization problems involving finite sums. This bound is highly conservative and depends on unstable quantities, and we relax it in several directions to show that non convexity can have a much milder impact on finite sum minimization problems such as empirical risk minimization and multitask classification. As a byproduct, we show a new version of Maurey's classical approximate Carath\'eodory lemma where we sample a significant fraction of the coefficients, without replacement.

INI 1 
09:00 to 09:45 
Urvashi Oswal (University of WisconsinMadison) Selection and Clustering of Correlated variables using OWL/GrOWL regularizers
In highdimensional linear regression problems, it is likely that several covariates are highly correlated e.g. in fMRI data, certain voxels or brain regions may have very correlated activation patterns. Using standard sparsityinducing regularization (such as Lasso) in such scenarios is known to be unsatisfactory, as it leads to the selection of a subset of the highly correlated covariates. For engineering purposes and/or scientific interpretability, it is often desirable to explicitly identify all of the covariates that are relevant for modeling the data. In this talk, I will present clustering properties and error bounds of Ordered Weighted $\ell_1$ (OWL) regularization for linear regression, and a group variant for multitask regression called Group OWL (GrOWL). I will present the application of OWL/GrOWL in three settings: (1) Brain networks (2) Subspace clustering and (3) Deep Learning. I will demonstrate, in theory and experiments, how OWL/GrOWL deal with strongly correlated covariates by automatically clustering and averaging regression coefficients associated with those covariates.
This is joint work with Robert Nowak, Mário Figueiredo, Tim Rogers and Chris Cox.

INI 1  
09:45 to 10:30 
Elizaveta Levina (University of Michigan) Matrix completion in network analysis
Matrix completion is an active area of research in itself, and a natural tool to apply to network data, since many real networks are observed incompletely and/or with noise. However, developing matrix completion algorithms for networks requires taking into account the network structure. This talk will discuss three examples of matrix completion used for network tasks. First, we discuss the use of matrix completion for crossvalidation on network data, a longstanding problem in network analysis. Two other examples focus on reconstructing incompletely observed networks, with structured missingness resulting from network sampling mechanisms. One scenario we consider is egocentric sampling, where a set of nodes is selected first and then their connections to the entire network are observed. Another scenario focuses on data from surveys, where people are asked to name a given number of friends. We show that matrix completion, when used appropriately, can generally be very helpful in solving network problems.

INI 1  
10:30 to 11:00  Morning Coffee  
11:00 to 11:45 
Garvesh Raskutti (University of WisconsinMadison) Estimating sparse additive autoregressive network models
Consider a multivariate time series, which may correspond to spike train responses for multiple neurons in a brain, crime event data across multiple regions, and many others. An important challenge associated with these time series models is to estimate an influence network between the d variables, especially when the number of variables d is large meaning we are in the highdimensional setting. Prior work has focused on parametric vector autoregressive models. However, parametric approaches are somewhat restrictive in practice. In this paper, we use the nonparametric sparse additive model (SpAM) framework to address this challenge. Using a combination of beta and phimixing properties of Markov chains and empirical process techniques for reproducing kernel Hilbert spaces (RKHSs), we provide upper bounds on meansquared error in terms of the sparsity s, logarithm of the dimension logd, number of time points T, and the smoothness of the RKHSs. Our rates are sharp up to logarithm factors in many cases. We also provide numerical experiments that support our theoretical results and display potential advantages of using our nonparametric SpAM framework for a Chicago crime dataset.

INI 1  
11:45 to 12:30 
Peter Bartlett (University of California, Berkeley) Representation, optimization and generalization properties of deep neural networks
Deep neural networks have improved the stateoftheart performance for prediction problems across an impressive range of application areas. This talk describes some recent results in three directions. First, we investigate the impact of depth on representational properties of deep residual networks, which compute nearidentity maps at each layer, showing how their representational power improves with depth and that the functional optimization landscape has the desirable property that stationary points are optimal. Second, we study the implications for optimization in deep linear networks, showing how the success of a family of gradient descent algorithms that regularize towards the identity function depends on a positivity condition of the regression function. Third, we consider how the performance of deep networks on training data compares to their predictive accuracy, we demonstrate deviation bounds that scale with a certain "spectral complexity," and we compare the behavior of these bounds with the observed performance of these networks in practical problems. Joint work with Steve Evans, Dylan Foster, Dave Helmbold, Phil Long, and Matus Telgarsky. 
INI 1  
12:30 to 14:00  Buffet Lunch at INI  
14:00 to 17:00  Free Afternoon 
09:00 to 09:45 
Tong Zhang (Rutgers, The State University of New Jersey) Candidates vs. Noises Estimation for Large MultiClass Classification Problem
In practice, there has been sigificant interest in multiclass classification problems where the number of classes is large. Computationally such applications require statistical methods with run time sublinear in the number of classes. A number of methods such as NoiseContrastive Estimation (NCE) and variations have been proposed in recent years to address this problem. However, the existing methods are not statistically efficient compared to multiclass logistic regression, which is the maximum likelihood estimate. In this talk, I will describe a new method called Candidate v.s. Noises Estimation (CANE) that selects a small subset of candidate classes and samples the remaining classes. We show that CANE is always consistent and computationally efficient. Moreover, the resulting estimator has low statistical variance approaching that of the maximum likelihood estimator, when the observed label belongs to the selected candidates with high probability. Extensive experimental results show that CANE achieves better prediction accuracy over a number of the stateoftheart tree classifiers, while it gains significant speedup compared to standard multiclass logistic regression.

INI 1  
09:45 to 10:30 
Aurore Delaigle (University of Melbourne) Estimating a covariance function from fragments of functional data
Functional data are often observed only partially, in the form of
fragments. In that case, the standard approaches for estimating
the covariance function do not work because entire parts of the
domain are completely unobserved. In previous work, Delaigle and
Hall (2013, 2016) have suggested ways of estimating the covariance
function, based for example on Markov assumptions. In this work
we take a completely different approach which does not rely on such
assumptions. We show that, using a tensor product approach, it is
possible to reconstruct the covariance function using observations
located only on the diagonal of its domain.

INI 1  
10:30 to 11:00  Morning Coffee  
11:00 to 11:45 
Francis Bach (INRIA Paris  Rocquencourt); (ENS  Paris) Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes
We consider stochastic gradient descent (SGD) for
leastsquares regression with potentially several passes over the data. While
several passes have been widely reported to perform practically better in terms
of predictive performance on unseen data, the existing theoretical analysis of
SGD suggests that a single pass is statistically optimal. While this is true
for lowdimensional easy problems, we show that for hard problems, multiple
passes lead to statistically optimal predictions while single pass does not; we
also show that in these hard models, the optimal number of passes over the data
increases with sample size. In order to define the notion of hardness and show
that our predictive performances are optimal, we consider potentially
infinitedimensional models and notions typically associated to kernel methods,
namely, the decay of eigenvalues of the covariance matrix of the features and
the complexity of the optimal predictor as measured through the covariance
matrix. We illustrate our results on synthetic experiments with nonlinear
kernel methods and on a classical benchmark with a linear model. (Joint work
with Loucas PillaudVivien and Alessandro Rudi)

INI 1  
11:45 to 12:30 
Edward Ionides (University of Michigan) Monte Carlo adjusted profile likelihood, with applications to spatiotemporal and phylodynamic inference.
Partially observed nonlinear stochastic dynamic systems raise inference challenges. Sequential Monte Carlo (SMC) methods provide a route to accessing the likelihood function. However, despite the advantage of applicability to a wide class of nonlinear models, standard SMC methods have a limitation that they scale poorly to large systems. We present a profile likelihood approach, properly adjusted for Monte Carlo uncertainty, that enables likelihoodbased inference in systems for which Monte Carlo error remains large despite stretching the limits of available computational resources. Together with stateoftheart SMC algorithms, this technique permits effective inference on some scientific problems in panel time series analysis, spatiotemporal modeling, and inferring population dynamic models from genetic sequence data.
The results presented are joint work with Carles Breto, Joonha Park, Alex Smith and Aaron King.

INI 1  
12:30 to 14:00  Buffet Lunch at INI  
14:00 to 14:45 
Jinchi Lv (University of Southern California) Asymptotics of Eigenvectors and Eigenvalues for Large Structured Random Matrices
Characterizing the exact asymptotic distributions of highdimensional eigenvectors for large structured random matrices poses important challenges yet can provide useful insights into a range of applications. This paper establishes the asymptotic properties of the spiked eigenvectors and eigenvalues for the generalized Wigner random matrix, where the mean matrix is assumed to have a lowrank structure. Under some mild regularity conditions, we provide the asymptotic expansions for the spiked eigenvalues and show that they are asymptotically normal after some normalization. For the spiked eigenvectors, we provide novel asymptotic expansions for the general linear combination and further show that the linear combination is asymptotically normal after some normalization, where the weight vector can be an arbitrary unit vector. Simulation studies verify the validity of our new theoretical results. Our family of models encompasses many popularly used ones such as the stochastic block models with or without overlapping communities for network analysis and the topic models for text analysis, and our general theory can be exploited for statistical inference in these largescale applications. This is a joint work with Jianqing Fan, Yingying Fan and Xiao Han.

INI 1  
14:45 to 15:30 
Rui Castro (Technische Universiteit Eindhoven) Are there needles in a (moving) haystack? Adaptive sensing for detection and estimation of static and dynamically evolving signals
In this work we investigate the problem of testing for the presence of dynamically evolving sparse signals. This is motivated by settings were the signal of interest changes rapidly while measurements are being collected (e.g., monitoring of the electromagnetic spectrum, or detection of hot spots of a rapidly spreading disease). We model the signal as an n dimensional vector that has s nonzero components, and at each time a fraction of the nonzero components may change their location. The statistical problem is to decide whether the signal is a zero vector or in fact it has nonzero components. This decision is based on m noisy observations of individual signal components. We consider two different sensing paradigms, namely adaptive and nonadaptive sensing. For nonadaptive sensing the choice of components to measure has to be decided before the data collection process started, while for adaptive sensing one can adjust the sensing process based on observations collected earlier. We characterize the difficulty of this detection problem in both sensing paradigms, with special interest to the speed of change of the active components. In addition we provide an adaptive sensing algorithm for this problem and contrast its performance to that of best nonadaptive detection algorithms. Interestingly, when the number of measurements is small (on the order n/s), there is a significant difference in performance between the two sensing paradigms, as nonadaptive sensing is unable to exploit the dynamic nature of the signal (based on joint works with Ervin Tánczos).

INI 1  
15:30 to 16:00  Afternoon Tea  
16:00 to 16:45 
Pradeep Ravikumar (Carnegie Mellon University) Robust Estimation via Robust Gradient Estimation
A common assumption in the training of machine learning systems is that the data is sufficiently clean and wellbehaved: there are very few or no outliers, or that the distribution of the data does not have very long tails. As machine learning finds wider usage, these assumptions are increasingly indefensible. The key question then is how to perform estimation that is robust to departure from these assumptions; and which has been of classical interest, with seminal contributions due to Box, Tukey, Huber, Hampel, and several others. Loosely, there seemed to be a computationrobustness tradeoff, practical estimators did have strong robustness guarantees, while estimators with strong robustness guarantees were computationally impractical.
In our work, we provide a new class of computationallyefficient class of estimators for risk minimization that are provably robust to a variety of robustness settings, such as arbitrary oblivious contamination, and heavytailed data, among others. Our workhorse is a novel robust variant of gradient descent, and we provide conditions under which our gradient descent variant provides accurate and robust estimators in any general convex risk minimization problem. These results provide some of the first computationally tractable and provably robust estimators for general statistical models.
Joint work with Adarsh Prasad, Arun Sai Suggala, Sivaraman Balakrishnan.

INI 1  
19:30 to 22:00  Formal Dinner at St John's College 
09:00 to 09:45 
Alexandre Tsybakov (CREST: Centre de Recherche en Économie et Statistique) Does data interpolation contradict statistical optimality?
We exhibit estimators that interpolate the data, and yet achieve optimal rates of convergence for the problems of nonparametric regression and prediction with square loss. This curious observation goes against the usual intuition that a good statistical procedure should forego the exact fit to data in favor of a more smooth representation.
A motivation for this work is the recent focus within the machine learning community on the outofsample performance of neural networks. These flexible models are typically trained to fit the data exactly (either in their sign or in the actual value), and yet they have good prediction behaviour. This talk is based on joint work with Misha Belkin and Sasha Rakhlin.

INI 1  
09:45 to 10:30 
Vincent Vu (Ohio State University) Group invariance and computational sufficiency
Statistical sufficiency formalizes the notion of data reduction. In the decision theoretic interpretation, once a model is chosen all inferences should be based on a sufficient statistic. However, suppose we start with a set of methods that share a sufficient statistic rather than a specific model. Is it possible to reduce the data beyond the statistic and yet still be able to compute all of the methods? In this talk, I'll present some progress towards a theory of "computational sufficiency" and show that strong reductions _can_ be made for large classes of penalized Mestimators by exploiting hidden symmetries in the underlying optimization problems. These reductions can (1) enable efficient computation and (2) reveal hidden connections between seemingly disparate methods. As a main example, I'll show how the theory provides a surprising answer to the following question: "What do the Graphical Lasso, sparse PCA, singlelinkage clustering, and L1 penalized Ising model selection all have in common?"

INI 1  
10:30 to 11:00  Morning Coffee  
11:00 to 11:45 
Richard Samworth (University of Cambridge) Data perturbation for data science
When faced with a dataset and a problem of interest, should we propose a statistical model and use that to inform an appropriate algorithm, or dream up a potential algorithm and then seek to justify it? The former is the more traditional statistical approach, but the latter appears to be becoming more popular. I will discuss a class of algorithms that belong in the second category, namely those that involve data perturbation (e.g. subsampling, random projections, artificial noise, knockoffs,...). As examples, I will consider Complementary Pairs Stability Selection for variable selection and sparse PCA via random projections.
This will involve joint work with Rajen Shah, Milana Gataric and Tengyao Wang.

INI 1  
11:45 to 12:30 
Peter Bühlmann (ETH Zürich); Domagoj Ćevid (ETH Zürich) Deconfounding using Spectral Transformations
Highdimensional regression methods which rely on the sparsity of the ground truth, such as the Lasso, might break down in the presence of confounding variables. If a latent variable affects both the response and the predictors, the correlation between them changes. This phenomenon can be represented as a linear model where the sparse coefficient vector has been perturbed. We will present our work on this problem. We investigate and propose some spectral transformations for the data which serve as input for the Lasso. We discuss assumptions for achieving the optimal error rate and illustrate the performance on a genomic dataset. The approach is easy to use and leads to convincing results. The talk is based on joint work with Nicolai Meinshausen.

INI 1  
12:30 to 14:00  Buffet Lunch at INI 