Sehyun Oh, PhD

I am an Assistant Professor at CUNY SPH, with expertise in both experimental biology and bioinformatics. As a molecular biologist by training, I had studied DNA repair and telomere maintenance mechanisms during my doctoral and postdoctoral research. As a bench scientist, I started to notice the limitations of arguing the extent to which my findings in cell lines were actually happening in living organisms and relevant to public health, and this made me interested in the potential of large public datasets. I made a career transition from a bench scientist to a bioinformatics scientist and joined Dr. Waldron’s lab at CUNY SPH as a postdoctoral researcher in 2017. Since then, I had worked on many research projects, published papers, and have developed a wide collaborative network and profound experience and understanding of large public omics data analysis, statistical method development for high-dimensional data, Cloud-based computing, AnVIL workspace and workflow developments, user-friendly software development. Currently, I am working on a NIH-funded project to construct an omics data repository designed for the easy application of Artificial Intelligence and Machine Learning tools. My over-arching career goal is to facilitate interdisciplinary research through the development of intuitive bioinformatics infrastructure and user-friendly tools that lower barriers across different disciplines and resources. In my free time, I enjoys ballroom dancing and exploring different neighborhoods in New York.

Positions

2023 - Present Assistant Professor CUNY School of Public Health New York, NY, USA
2022 - 2023 Research Assistant Professor CUNY School of Public Health New York, NY, USA

Education and Training

Postdoc Bioinformatics City University of New York New York, NY, USA
Postdoc Microbiology Columbia University New York, NY, USA
Ph.D. Molecular Biology University of Minnesota - Twin Cities Minneapolis, MN, USA
B.S. Biological Sciences Seoul National University Seoul, Korea

Current Funding

2024 - 2029 NIH 1U01 CA230551 (Co-I), Exploiting public metagenomic data to uncover cancer-microbiome relationships.
The human microbiome is implicated in the development and response to treatment of some cancers, including infectious agents estimated to be responsible for ~20% of the global cancer burden. However, previously unrecognized bacterial and viral strains, as well as loss of normal structure and function of human-associated microbiomes, likely play additional roles in disease etiology and treatment. This project investigates the role of the human microbiome in cancer by applying novel and state-of-the-art methods to published metagenomic data, and provides enhanced, expanded, and more efficiently usable microbiome data resources back to the cancer research community for a broad range of investigations.
2024 - 2029 NIH 2U24 CA180996 (Co-I), Cancer genomics: integrative and salable solutions in R/Bioconductor.
Researchers gather a wide range of complex genetic information in order to comprehend the intricate factors involved in the development and treatment of cancer. This project develops, expands, and sustains essential software and data resources that aid cancer researchers in effectively managing and analyzing this information using advanced computational and statistical approaches.
2024 - 2025 PSC-CUNY Research Award (PI), Construct informatics infrastructure for transfer learning in biomedical research
This proposal reinforce the pre-trained model for biological signatures (RAVmodel from GenomicSuperSignature) with manual curation, improving its usability and interpretability and providing the feasibility to expand our method to different model systems and biological modalities.

Completed Funding

2022 - 2023 NIH 2U24 CA180996 AI/ML Supplement (Co-I), Cancer genomics: integrative and salable solutions in R/Bioconductor.
A wealth of genomic datasets have been made publicly available and reusable to research communities; however many are not readily usable by Machine Learning (ML) algorithms. This project targets hundreds of primarily cancer-focused, multi-modal genomic datasets that have previously been harmonized for analysis in the Bioconductor Project for open-source Bioinformatics, and translates them to formats broadly suited for large scale ML.