skip to content

The xyz algorithm for fast interaction search in high-dimensional data

Presented by: 
Rajen Shah University of Cambridge
Thursday 18th January 2018 - 16:00 to 16:45
INI Seminar Room 1
When performing regression on a dataset with $p$ variables, it is often of interest to go beyond using main effects and include interactions as products between individual variables. However, if the number of variables $p$ is large, as is common in genomic datasets, the computational cost of searching through $O(p^2)$ interactions can be prohibitive. In this talk I will introduce a new randomised algorithm called xyz that is able to discover interactions with high probability and under mild conditions has a runtime that is subquadratic in $p$. The underlying idea is to transform interaction search into a much simpler close pairs of points problem. We will see how strong interactions can be discovered in almost linear time, whilst finding weaker interactions requires $O(p^u)$ operations for $1<u<2$ depending on their strength. An application of xyz to a genome-wide association study shows how more than $10^{11}$ interactions can be screened in minutes using a standard laptop. This is joint work with Gian Thanei and Nicolai Meinshausen (ETH Zurich).
The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons