MS in Data Science—Genomics Track
In 2023, a new track in Genomics was approved by the New York State Department of Education for the masters of science (MS) in data science program.
The curriculum will be available to any student in the data science MS program and will be required for Genomic Intensive Data Science Research, Education and Mentorship (GIDS-REM) fellows.
The genomics data science MS curriculum follows a specific timeline designed to maximize the goals for GIDS-REM fellows. Students will take core data science (DSCC) MS courses and genomics courses in fall and spring during a 19-month period. Supplemental workshops and seminars complement the program. Internships are guaranteed for GIDS-REM fellows and optional for other students.
The curriculum, which totals 33 credits, covers theoretical and applied aspects of data science and genomics. This educational program includes core components to develop competencies in fundamental data science concepts, computational biology and genomics as well as a sequence of workshops and seminars that aim to teach the most important applied bioinformatics workflow used by faculty in our institution. The combination of fundamental and applied training that fellows receive in their first two semesters aims to ensure fellows are ready to be productive for an internship/research assistantship in the summer of year one.
The core year one curriculum in Data Science includes:
- DSCC 462: Computational Introduction to Statistics teaches combinatorics, descriptive statistics, elementary probability and distribution theory, statistical inference, and statistical modeling emphasizing R-based computational exercises to illustrate these concepts.
- DSCC 461: Introduction to Databases covers database design, e.g. data models, description languages, and query facilities such as SQL.
- DSCC 440: Data Mining covers applied uses of databases, including attributes, visualization, pre-processing, mining frequent patterns, association, correlation, classification methods, and cluster analysis.
- DSCC 465: Introduction to Statistical Machine Learning provides an introduction to modern machine learning concepts, techniques, and algorithms. Topics discussed include regression, clustering and classification, kernels, support vector machines, feature selection, goodness of fit, neural networks. Programming assignments emphasize taking theory into practice working with Python programming environment
In year two, the student will take DSCC 483: Data Science Practicum course which will be a research based project related to bioinformatics/genomics.
- BST 434: Genomic Data Analysis introduces modern genomic techniques and the corresponding statistical methods and software available to visualize, analyze, and interpret these data. Specific topics include mRNA/microRNA expression, copy number variants, single nucleotide variants, DNA methylation, and microbial abundance.
- BIOL 453: Computational Biology provides practical introduction to algorithms commonly used in computational biology, including alignment, motif finding, maximum likelihood, Markov models (HMM and MCMC), expectation maximization and machine learning. This is a lecture course with an accompanying lab, where students use and write scripts to solve problems and complete a research project where they develop or apply algorithms to answer a research question.
- BIOL 457: Applied Genomics is a research-based course that covers genome sequencing, assembly and analysis, functional genomics, population genomics and genome evolution. Students read primary literature, build core computer programming skills (Python, R and Bash) and then design and complete a group research project using publicly available genomic data.
- IND 501: SMD Research Ethics provides instruction in the responsible conduct of research within a medical center.
During the winter break, a sequence of three extra-curricular, hands-on workshops aimed at building core competencies in genomic data analysis will be offered. These workshops, which cover nine core competencies, will be required for GIDS-REM fellows, and open to any researchers in the University of Rochester and Medical Center communities.
The workshops will be organized by the Genomics Research Center in the University of Rochester Medical Center with GIDS-REM faculty assisting in preparing curriculum and lectures. The three modular workshops will be arranged around specific genomic data types: RNA-seq, DNA-seq, and functional genomic assays, each addressing several core competencies.
Workshop topics complement the genomics data science curriculum and will focus on best practices in genomic data analysis that may include data cleaning and quality control, basic sequence data generation, bulk and single cell RNAseq, functional genomics, variant calling, gene enrichment and pathway analysis methods, experimental design and power calculations, and accessing and using public data.
GIDS-REM fellows will receive internships as bioinformaticians embedded in labs. These summer research assistantships are expected to be full time and funded by the sponsoring program faculty's lab or industry partner. The research internships may lead directly to employment or recruitment into PhD programs here at the University of Rochester. Students who are not accepted to the fellowship do not receive guaranteed internships.
A weekly Applied Genomics Research Seminar, which will also be open to all at the University and required by GIDS-REM fellows, will focus on short presentations on applied problems in genomics, emphasizing pragmatic solutions. It will culminate in a formal presentation or poster on the research topic which will be presented in the early fall, ensuring that the entering cohort of GIDS-REM fellows will be able to see what the graduating cohort has accomplished.
Students also become eligible to serve in our Genomics Office Hours Consulting service. Fellows will hold an office hour to provide assistance with genomic data analysis and paired with staff from the University of Rochester Genomics Research Center (UR GRC.)
Mentors will facilitate matching students with research advisors in labs at the University of Rochester or through industry internships. Fellows may select one of their mentors as their research advisor; however, this is not a requirement of the program. See the GIDS-REM program faculty directory page for a list of current mentors.
A bridging course (DSCC 162: Data Structures in Python) for students who are otherwise well-qualified for the fellowship but lack experience in computer science algorithms and data structures is included for entering the program is taught in the summer prior to matriculating into the program. The course does not count towards the master’s program of study but will be covered by scholarship discounts. Students who can demonstrate sufficient experience or coursework can waive this requirement.
To prepare students to enter the workforce in genomics data science, we encourage innovative professional development opportunities focused on networking, public speaking, and technical skills. This may include traveling to relevant conferences, attending colloquia and speaker series events, participating in student round table discussions, participate in hackathons, visit biotechnology firms, and/or participation in professional and personal development groups across the University and Medical Center. Students will also have access to the services of the Greene Center for Career Education and Connections.
The Genomic Intensive Data Science Research, Education and Mentorship (GIDS-REM) will award fellowships to applicants interested in the genomics track. For fellows, the training program encompasses three goals:
- Solid training in the fundamentals of data science, including data mining and statistics.
- Developing practical and curricular training in genomics to prepare students for genomics research experiences including algorithms commonly used in computational biology and coursework in statistical genomics.
- Making fellows competitive for top genomics research positions in industry or PhD programs.
Fellows receive a full tuition scholarship which is to be funded 25% by NIH R25HG012324 grant and 75% by a scholarship awarded by the University as well as a 7-month stipend, health insurance and fees, and summer stipend for their internship/research assistantship. The funded fellowship is open to US citizens and permanent residents.
International students may participate in genomics track programming but are not fully funded and do not receive guaranteed internships.
To apply for the fellowship, follow all instructions for the regular application to data science. Specifically answer questions in the ASE Application Information of the application in the following manner:
- Select “Data Science” for the Program of Study
- Select the “Genomics” Sub-Category
- Select “Master’s” status
- In the Field of Interest section, provide a short response expounding your interest in the Genomics track
Per University of Rochester policy, fellows will be required to maintain a B- or better in all required classes in the genomics track curriculum and satisfactorily complete a summer research experience to remain in good standing as GIDS-REM fellows. Fellows who receive a grade lower than a B- in a required class will be placed on probation and receive additional counseling from their mentors and academic advisors. They will be removed from the program if they receive another grade lower than B- in a required class.