News & Events

Biostatitstics & Computational Biology Fall Colloquium: Xing Qui

November 30, 2017
03:30 PM - 05:30 PM
Helen Wood Hall – School of Nursing Auditorium 1W-304

Xing Qiu, Ph.D.
Associate Professor, Department of Biostatistics & Computational Biology University of Rochester

Department of Biostatistics and Computational Biology
University of Rochester School of Medicine and Dentistry
2017 Fall Colloquium

“Toward the Era of “Large 𝒑𝒑𝟐𝟐, Medium 𝒏𝒏”

Abstract:
In the past two decades or so, the emergence of many different types of high-throughput data, such as whole transcriptome gene expression, genotyping, and microbiota abundance data, has revolutionized medical research. One common property shared by these “Omics” data is that typically they have much more features than the number of independent measurements (sample size). This property is also known as the “large p, small n” property in the research community, and has motivated many instrumental statistical innovations. A few of these examples include Benjamini-Hochberg’s FDR controlling multiple testing procedure; Fan and Lv’s sure independence screening; a host of advanced penalized regression methods; sparse matrix and tensor decomposition techniques; just to name a few. Due to the rapid advancing of biotechnology, the unit cost of generating high-throughput data has decreased significantly in recent years. Consequently, the sample size of those data in a respectful study is now about 𝑛𝑛=100~500, which I consider as “medium n”, and is certainly a huge improvement to the old “small n” studies in which 𝑛𝑛<10 is the norm. With the increased sample size, medical investigators are starting to ask more sophisticated questions – feature selection based on hypothesis testing and regression analysis is no longer the end, but the new starting point for secondary analyses such as network analysis, multi-modal data association, gene set analyses, etc. The overarching theme of these advanced analyses is that they all require statistical inference for models that involve 𝑝𝑝2 parameters. In my opinion, it takes a combination of proper data preprocessing, feature selection, dimension reduction, model building and selection, as well as domain knowledge and computational skills to do it right. Despite of the technical difficulties of designing and performing these avant-garde analyses, I believe that they will soon become mainstream, and inspire a generation of young statisticians and data scientists to invent the next big breakthroughs in statistical science. In this talk, I will share some of my recent methodology and collaborative research that involves “large 𝑝𝑝2” models, and list a few potential extensions of these methods that may be used in other areas of statistics.

Links:
Fall 2017 Colloquia

Category: Talks