Fall Term Schedule
Only courses with a DSC course number are listed on this page. See MS program for all of the required and elective courses for the degree.
Fall 2023
Number | Title | Instructor | Time |
---|
DSCC 401-1
Brendan Mort
MW 9:00AM - 10:15AM
|
This course provides a hands-on introduction to widely-used tools for data science. Topics include Linux; languages and packages for statistical analysis and visualization; cluster and parallel computing including GPUs; Hadoop and Spark; libraries for machine learning; NoSQL databases; and cloud services. PREREQUISITES: CSC 161, CSC 171 or some equivalent programming experience strongly recommended.
|
DSCC 420-1
Gonzalo Mateos Buckstein
MW 4:50PM - 6:05PM
|
The goal of this course is to learn how to model, analyze and simulate stochastic systems, found at the core of a number of disciplines in engineering, for example communication systems, stock options pricing and machine learning. This course is divided into five thematic blocks: Introduction, Probability review, Markov chains, Continuous-time Markov chains, and Gaussian, Markov and stationary random processes. Prerequisites: ECE 242 or equivalent
|
DSCC 435-1
Jiaming Liang
TR 9:40AM - 10:55AM
|
This course primarily focuses on algorithms for large-scale optimization problems arising in machine learning and data science applications. The first part will cover first-order methods including gradient and subgradient methods, mirror descent, proximal gradient method, accelerated gradient method, Frank-Wolfe method, and inexact proximal point methods. The second part will introduce algorithms for nonconvex optimization, stochastic optimization, distributed optimization, manifold optimization, reinforcement learning, and those beyond first-order.
|
DSCC 440-2
Jiebo Luo
TR 3:25PM - 4:40PM
|
Fundamental concepts and techniques of data mining, including data attributes, data visualization, data pre-processing, mining frequent patterns, association and correlation, classification methods, and cluster analysis. Advanced topics include outlier detection, stream mining, and social media data mining. CSC 440, a graduate-level course, requires additional readings and a course project.
|
DSCC 449-1
Robert Jacobs
TR 11:05AM - 12:20PM
|
How can computer models help us understand how people perceive and reason about their environments? This course addresses this question, with emphasis placed on how people use probabilistic reasoning in order to represent and manage ambiguity and uncertainty for the purpose of making intelligent decisions. The course is relevant to students with interests in computational studies of human perception and cognition, and to students with interests in artificial intelligence. Homework assignments will require students to write computer programs using the Python programming language. Prerequisites: MATH 161, MATH 162, and CSC 161 (or equivalent proficiency in Python programming) required. MATH 164, MATH 165, and/or STAT 213 are helpful but not required.
|
DSCC 461-1
Eustrat Zhupa
MW 12:30PM - 1:45PM
|
This course presents the fundamental concepts of database design and use. It provides a study of data models, data description languages, and query facilities including relational algebra and SQL, data normalization, transactions and their properties, physical data organization and indexing, security issues and object databases. It also looks at the new trends in databases. The knowledge of the above topics will be applied in the design and implementation of a database application using a target database management system as part of a semester-long group project.
|
DSCC 462-2
Anson Kahng
TR 4:50PM - 6:05PM
|
This course will cover foundational concepts in descriptive analyses, probability, and statistical inference. Topics to be covered include data exploration through descriptive statistics (with a heavy emphasis on using R for such analyses), elementary probability, diagnostic testing, combinatorics, random variables, elementary distribution theory, statistical inference, and statistical modeling. The inference portion of the course will focus on building and applying hypothesis tests and confidence intervals for population means, proportions, variances, and correlations. Non-parametric alternatives will also be introduced. The modeling portion of the course will include ANOVA, and simple and multiple regression and their respective computational methods. Students will be introduced to the R statistical computing environment. PREREQUISITES: MTH 150 or MTH 150A; AND MTH 142 or MTH 161 or MTH 171 (or equivalent discrete math and calculus coursework)
|
DSCC 475-1
Ajay Anand
TR 11:05AM - 12:20PM
|
Description: Time series analysis is a valuable data analysis technique in a variety of industrial (e.g., prognostics and health management), business (e.g., financial data analysis) and healthcare (e.g., disease progression modeling) applications. Moreover, forecasting in time series is an essential component of predictive analytics. The course will begin with an introduction to practical aspects relevant to time series data analysis such as data collection, characterization, and preprocessing. Topics covered will include smoothing methods (moving average, exponential smoothing), trend and seasonality in regression models, autocorrelation, AR and ARIMA models applied to time series data. Deep learning models including feedforward, recurrent, gated and convolutional architectures will also be studied. Students shall work on projects with time-series data sets using modeling tools in Python. PREREQUISITES: Introductory Statistics (DSC 262/STT212/STT213 or equivalent), Linear Algebra and Differential equations (MTH 165 or equivalent), and applied Python programming (CSC161 or equivalent)
|
DSCC 483-1
Ajay Anand; Cantay Caliskan
MW 10:25AM - 11:40AM
|
The capstone/practicum provides an experience for data science majors/MS candidates to apply the core knowledge and skills attained during their program to a tangible data science focused project. Students will work in small teams on a project that applies data science methods to the analysis of a real-world problem. The instructor will guide each team in developing a topic that makes use of the knowledge the team members gained through their application area courses. The identified projects or problems and data sets will cover a range of application areas and reflect real-world needs from industry, medicine and government. Each student will be required to write a paper about their project, which satisfies one upper-level writing requirement for majors and Plan B for master's. PREREQUISITES: DSC 240/440 (Data Mining) AND an introductory statistics course such as DSCC 262/462, STT212 or STT213 or equivalent. DSC 261/461 (Database Systems) strongly recommended prior but may be taken concurrently. FOR DSC GRADUATING SENIORS and MS CANDIDATES. GRADUATING STUDENTS this semester have priority for eligibility/instructor permission. PERMISSION REQUEST: To seek instructor permission/eligibility, follow directions on UR Student.
|
DSCC 491-2
Joseph Ciminelli
|
Bibliometric Analysis: The project entails an exploration of the Web of Science (WOS) data from Clarivate Analytics. It is a bibliometric study with several potential directions that will be determined after an initial analysis of the change in metadata coverage over time. Possible topics include studying trends in author collaborations by affiliation and discipline, semantic analysis of keywords, and/or looking at factors that contribute to a journal’s inclusion in WOS’s Core Collection from the Emerging Sources Citation Index. Course Evaluations: Weekly meetings with progress reports culminating into a project output that could be built upon in future. |
DSCC 494-01
Cantay Caliskan
|
Internship must have approval of ASE graduate practical research internship. See data science graduate coordinator to initiate approval process. |
DSCC 494-02
Ajay Anand
|
Blank Description |
DSCC 495-10
Tolulope Olugboji
|
Hunting for Earth Echoes by Sequencing Noisy Seismograms - The largest layer in the solid Earth is the mantle. It is made up of rocks with a composition that varies in a manner that is not well understood. This structure is the marble cake. We will use the Earthscope USArray (> 3,000 sensors) to scan for wave echoes underneath the US and Alaska and use it to test theories of mantle composition. The student will use unsupervised machine learning techniques (e.g, sequencer) to scan for hidden patterns in seismograms (recordings of earthquake waves). Successful detection of body wave echoes (reflections in the mantle), especially when scrambled by noise, holds clues for resolving mantle composition. Meetings every other week, Report Writing and Presentation at the end of the semester |
DSCC 495-11
Pablo Postigo Resa
|
Project involves design of optical multilayer filters using machine learning, image processing techniques followed by symbiotic organism search optimization. Evaluation based on meetings ,reports and presentations. |
DSCC 495-12
Joseph McFall
|
The overarching goal for the present research is to understand predictors of Rochester school district attrition, including whether CI's T-CRS measure has predicitve validity for attrition: Evaluation by attendance at weekly meetings; punctuality and participation; meeting analysis & write-up deadlines. |
DSCC 495-13
Hangfeng He
|
This project aims to provide a general framework for the categorization of temporal question answering (QA) based on the properties of natural language. With this new categorization framework, we can then conduct a systematical evaluation of the behaviors of large language models (LLMs) on temporal QA. Different model sizes will be evaluated on various temporal QA datasets. Evaluation: weekly meetings, final project presentation and final project report. |
DSCC 495-14
Elaine Hill
|
Process Mining to Uncover Racial Disparities in Healthcare Paths - This research project focuses on illuminating racial disparities in healthcare among older adults with chronic heart and lung conditions. Drawing from a comprehensive dataset of Electronic Health Records, the study will use process mining and network analysis techniques to quantify and visualize divergent health care paths. Specifically targeting Black and White, non-Hispanic populations, the research aims to elucidate important health inequities in care paths, amplifying the urgency of tackling systematic factors that may contribute to these disparities. Course evaluation: Weekly meetings, detailed writeup, and a summarizing presentation. |
DSCC 495-15
Hangfeng He
|
This project aims to evaluate the gender bias of Large Language Models (LLMs). LLMs with different sizes and different series of LLMs will be considered. In particular, information retrieval and data provenance will be used to track how gender bias is obtained in LLMs. At the same time, we can detect some popular web sources that are significantly biased. A global analysis of web sources in terms of gender bias will be conducted. Evaluation by: 1) Weekly meetings (0.5 hour each) 2) A final project presentation 3) A final project report |
DSCC 495-2
Jiebo Luo
|
This research intends to study the sexist hate speech targeting women players in the three English-speaking countries: USA, Australia and England during FIFA Women’s World Cup (WWC). The project will try to answer the questions: are there differences of hate speech in quantity and in frequent words across the three countries? What kind of event (goal, missing penalty etc.)/ trend could cause more hate speech to players? Who are the most targeted players and the reasons they are targeted? Plans to collect the comment data from Reddit by keywords, such as the name of athletes. This research will contribute to the current research in understanding the sexist hate speech toward female players in the most popular sport of the world and the hate speech toward other minority players. The main challenge could be the data quality and quantity. As Twitter has suspended all academic API, we can only collect data from Reddit, where may not have enough high-quality data. Grading will be based on submitting related papers. |
DSCC 495-3
Jean-Phillippe Couderc
|
The objective of the project is to evaluate existing DL models for the classification of ECGs recorded in LQTS patients. The first aim is to check their performance and to understand their weaknesses and strength. The secondary objective is to develop a model on the database from the Telemetric and Holter ECG warehouse. Ultimately, the technology should be used by physicians to detect the presence of LQTS and the type of mutations associated with this disease. Evaluation: weekly meeting and written paper |
DSCC 495-4
Ram Haddas
|
The final evaluation for this course will consist of the following:
|
DSCC 495-9
Cantay Caliskan
|
Generative AI Research: In Fall 2023, we will be working on an application that aims to develop a large language model (LLM) that will improve the living standards of communities affected by catastrophes. The developed tool will be tested on the victims of recent massive earthquakes in Turkey by collecting data from a major city hit by the earthquake, Hatay, Turkey. The project aims to contribute to the literature by expanding the limited literature at the cross-section of LLMs and the humanitarian sector (i), collecting data from the catastrophe-hit regions that can be used to better understand the impact caused by new catastrophes and improve the lives of victims elsewhere in the world (ii), and testing the helpfulness of virtual agents to organize information and combat misinformation (iii). The project will lead to the publication of at least two separate articles. In Fall 2023, we will work on laying the groundwork for the (larger) research agenda. Main project (85%): • Project charter/plan (5%) • Weekly Presentations / Demonstration of the Results (20%) • Midterm Presentation (10%) • Final Present |
DSCC 895-1
|
Blank Description |
DSCC 897-1
Ajay Anand
|
Please see advisor before enrolling. |
DSCC 899-1
Ajay Anand
|
see advisor before enrolling |
Fall 2023
Number | Title | Instructor | Time |
---|---|
Monday and Wednesday | |
DSCC 401-1
Brendan Mort
|
|
This course provides a hands-on introduction to widely-used tools for data science. Topics include Linux; languages and packages for statistical analysis and visualization; cluster and parallel computing including GPUs; Hadoop and Spark; libraries for machine learning; NoSQL databases; and cloud services. PREREQUISITES: CSC 161, CSC 171 or some equivalent programming experience strongly recommended. |
|
DSCC 483-1
Ajay Anand; Cantay Caliskan
|
|
The capstone/practicum provides an experience for data science majors/MS candidates to apply the core knowledge and skills attained during their program to a tangible data science focused project. Students will work in small teams on a project that applies data science methods to the analysis of a real-world problem. The instructor will guide each team in developing a topic that makes use of the knowledge the team members gained through their application area courses. The identified projects or problems and data sets will cover a range of application areas and reflect real-world needs from industry, medicine and government. Each student will be required to write a paper about their project, which satisfies one upper-level writing requirement for majors and Plan B for master's. PREREQUISITES: DSC 240/440 (Data Mining) AND an introductory statistics course such as DSCC 262/462, STT212 or STT213 or equivalent. DSC 261/461 (Database Systems) strongly recommended prior but may be taken concurrently. FOR DSC GRADUATING SENIORS and MS CANDIDATES. GRADUATING STUDENTS this semester have priority for eligibility/instructor permission. PERMISSION REQUEST: To seek instructor permission/eligibility, follow directions on UR Student. |
|
DSCC 461-1
Eustrat Zhupa
|
|
This course presents the fundamental concepts of database design and use. It provides a study of data models, data description languages, and query facilities including relational algebra and SQL, data normalization, transactions and their properties, physical data organization and indexing, security issues and object databases. It also looks at the new trends in databases. The knowledge of the above topics will be applied in the design and implementation of a database application using a target database management system as part of a semester-long group project. |
|
DSCC 420-1
Gonzalo Mateos Buckstein
|
|
The goal of this course is to learn how to model, analyze and simulate stochastic systems, found at the core of a number of disciplines in engineering, for example communication systems, stock options pricing and machine learning. This course is divided into five thematic blocks: Introduction, Probability review, Markov chains, Continuous-time Markov chains, and Gaussian, Markov and stationary random processes. Prerequisites: ECE 242 or equivalent |
|
Tuesday and Thursday | |
DSCC 435-1
Jiaming Liang
|
|
This course primarily focuses on algorithms for large-scale optimization problems arising in machine learning and data science applications. The first part will cover first-order methods including gradient and subgradient methods, mirror descent, proximal gradient method, accelerated gradient method, Frank-Wolfe method, and inexact proximal point methods. The second part will introduce algorithms for nonconvex optimization, stochastic optimization, distributed optimization, manifold optimization, reinforcement learning, and those beyond first-order. |
|
DSCC 449-1
Robert Jacobs
|
|
How can computer models help us understand how people perceive and reason about their environments? This course addresses this question, with emphasis placed on how people use probabilistic reasoning in order to represent and manage ambiguity and uncertainty for the purpose of making intelligent decisions. The course is relevant to students with interests in computational studies of human perception and cognition, and to students with interests in artificial intelligence. Homework assignments will require students to write computer programs using the Python programming language. Prerequisites: MATH 161, MATH 162, and CSC 161 (or equivalent proficiency in Python programming) required. MATH 164, MATH 165, and/or STAT 213 are helpful but not required. |
|
DSCC 475-1
Ajay Anand
|
|
Description: Time series analysis is a valuable data analysis technique in a variety of industrial (e.g., prognostics and health management), business (e.g., financial data analysis) and healthcare (e.g., disease progression modeling) applications. Moreover, forecasting in time series is an essential component of predictive analytics. The course will begin with an introduction to practical aspects relevant to time series data analysis such as data collection, characterization, and preprocessing. Topics covered will include smoothing methods (moving average, exponential smoothing), trend and seasonality in regression models, autocorrelation, AR and ARIMA models applied to time series data. Deep learning models including feedforward, recurrent, gated and convolutional architectures will also be studied. Students shall work on projects with time-series data sets using modeling tools in Python. PREREQUISITES: Introductory Statistics (DSC 262/STT212/STT213 or equivalent), Linear Algebra and Differential equations (MTH 165 or equivalent), and applied Python programming (CSC161 or equivalent) |
|
DSCC 440-2
Jiebo Luo
|
|
Fundamental concepts and techniques of data mining, including data attributes, data visualization, data pre-processing, mining frequent patterns, association and correlation, classification methods, and cluster analysis. Advanced topics include outlier detection, stream mining, and social media data mining. CSC 440, a graduate-level course, requires additional readings and a course project. |
|
DSCC 462-2
Anson Kahng
|
|
This course will cover foundational concepts in descriptive analyses, probability, and statistical inference. Topics to be covered include data exploration through descriptive statistics (with a heavy emphasis on using R for such analyses), elementary probability, diagnostic testing, combinatorics, random variables, elementary distribution theory, statistical inference, and statistical modeling. The inference portion of the course will focus on building and applying hypothesis tests and confidence intervals for population means, proportions, variances, and correlations. Non-parametric alternatives will also be introduced. The modeling portion of the course will include ANOVA, and simple and multiple regression and their respective computational methods. Students will be introduced to the R statistical computing environment. PREREQUISITES: MTH 150 or MTH 150A; AND MTH 142 or MTH 161 or MTH 171 (or equivalent discrete math and calculus coursework) |