UCSC New AnVIL Data Explorer for health research
The AnVIL Data Explorer, developed by the UC Santa Cruz Genomics Institute’s Computational Genomics Lab, is a web-based tool that streamlines how researchers discover and reuse high-value genomic datasets. Built on the NIH’s AnVIL platform, it currently catalogs over 280 datasets from major consortia including the Human Pangenome Reference Consortium, Telomere-to-Telomere, the 1000 Genomes Project, and the Center for Alzheimer’s and Related Dementias. Updates are added quarterly, ensuring access to the latest resources.
By organizing data in a secure, cloud-based environment, the Explorer eliminates the inefficiencies of downloading massive files to local servers. Scientists can search across studies, build tailored cohorts, and immediately begin analysis using Terra, AnVIL’s Google Cloud-based platform. Five customizable views (dataset, donor, biosample, activity, filename) make it easier to filter and focus on specific research needs. The tool also supports interoperability with resources such as the National Heart, Lung, and Blood Institute’s BioData Catalyst.
For example, an Alzheimer’s researcher could identify cohorts of patients with specific genetic variants linked to early-onset disease, combine that data with aging or neurogenetic studies, and quickly test new hypotheses. By accelerating access and integration, the Explorer maximizes the impact of existing data, helping to unlock discoveries that could lead to earlier diagnoses, targeted treatments, and better outcomes for patients.