Research

Public Data Reuse

Database Search and Transfer Learning

I developed a method for interpreting new transcriptomic datasets through instant comparison to public datasets without high-performance computing requirements. This methods is composed of a pre-computed model and a out-of-box Bioconductor software, GenomicSuperSignature, to apply the model on a new data. Efficient and coherent database search, robustness to batch effects and heterogeneous training data, and transfer learning capacity of our method are demonstrated using human datasets. Currently, we plan to expand this method into different species (e.g., mouse, yeast) and different omics data types (e.g., scRNAseq, metagenomics). (Quick Demo)

Construct Omics Database for Machine Learning

We are developing a FAIR and well-annotated omics data repository optimized for the application of Artificial Intelligence and Machine Learning (AI/ML) tools. This project aims to facilitate the development of robust and inclusive models that account for diverse population subgroups and other clinical information. (Quick Demo)

Comprehensive Cancer Data Analysis

Histopathology image data analysis

Microbiome

Antibiotics

Antibiotics are among the most broadly prescribed drugs in the United States, and adverse reactions represent significant morbidity. Although the mechanisms of action of different classes of antibiotics are known at the molecular level, these reactions occur within the complex and individualized ecosystems of the human microbiome where our understanding of their actions is much more limited. Our goal is to construct a curated and harmonized database for antibiotics-exposed microbiome data and use this resource to understand the personalized effects of antibiotic treatment on the gut microbiome in order to maximize their benefits while minimizing adverse effects.

Supporting Interdisciplinary Research

Cloud Computing

Advancements in sequencing technologies and the development of new data collection methods produce large volumes of biological data. However, the computational infrastructure and skills currently required to leverage the vast quantities of big biological data render such analyses unfeasible for basic, translational, and clinical researchers. I’m interested in developing a non-technical-user-friendly working environment to utilize public and Cloud-implemented workflows.

Software

GenomicSuperSignature

Bioconductor package for database search and transfer learning

HGNChelper

CRAN package for the identification and correction of invalid gene symbols

AnVILWorkflow

Bioconductor package for running AnVIL-implemented workflows using Google Cloud resources.

lefser

Bioconductor package for microbiome biomarker discovery. R implementation of LEfSe with the improved algorithm.