QLS Seminar Series - Alex Diaz-Papkovich
Topological analysis of high-dimensional human genetic data in biobanks
Alex Diaz-Papkovich, Brown University聽
Tuesday February 6, 12-1pm
Zoom Link:听
In Person: 550 Sherbrooke, Room 189
Abstract:听Now storing the genetic data of millions of individuals, biobanks have become rich repositories regularly used for scientific study and discovery. With the human genome spanning some three billion base pairs, any statistical analysis of a biobank is inherently a high-dimensional problem. To say nothing of the complexity of human genetics, we encounter challenges in both the scale of the data and in their composition.
We develop a tractable approach to study biobanks using uniform manifold approximation and projection (UMAP), a form of non-linear dimensionality reduction based in topological data analysis, and HDBSCAN, a density-based clustering algorithm. Using these tools, we visualize the data contained in biobanks and illustrate the relationships between population structure鈥攖he phenomenon of non-random genetic variation鈥攁nd variables like geography, demographic history, migration, social structure, and environmental measures. We identify population structure at a variety of scales, ranging from a handful to hundreds of thousands of individuals, uncover subtle relationships between our data, and discuss applications to exploratory data analysis, data QC, and polygenic scoring.