National Science Foundation Research Experience for Undergraduates (NSF REU)
Computational Methods for Understanding Music, Media, and Minds
How can a computer learn to read an ancient musical score? What can methods from signal processing and natural language analysis tell us about the history of popular music? Can a computer system teach a person to better use prosody (the musical pattern of speech) in order to become a more effective public speaker?
These are some of the questions that students will investigate in our REU: Computational Methods for Understanding Music, Media, and Minds. They will explore an exciting, interdisciplinary research area that combines machine learning, audio engineering, music theory, and cognitive science. Each student will work in a team with another student and will be mentored by two or more faculty members drawn from Computer Science, Electrical and Computer Engineering, Brain and Cognitive Science, the program in Digital Media Studies, and the Eastman School of Music.
Ajay Anand, PhD
Goergen Institute of Data Science
Department of Electrical and Computer Engineering, Department of Computer Science, and the Goergen Institute for Data Science
We are now accepting applications for the Summer 2020 session.
You are eligible to apply if:
- You are a 1st, 2nd, or 3rd year full-time student at a college or university.
- You are a U.S. citizen or hold a green card as a permanent resident.
- You will have completed two computer science courses or have equivalent programming experience by the start of the summer program.
It is not a requirement that you are a computer science major, or that you have prior research experience. We wish to recruit a diverse set of students, with different backgrounds and levels of experience. We encourage applications from students attending colleges that lack opportunities for research, and from students from communities underrepresented in computer science.
Before starting the application, you should prepare:
- An unofficial college transcript, that is, a list of your college courses and grades, as a pdf, Word, or text file. Include the courses you are currently taking.
- Your CV or resume, as a pdf, Word, or text file.
- A 300 word essay as a pdf, Word, or text file explaining why you wish to participate in this REU, including how it engages your interests, how the experience would support or help you define your career goals, and special skills and interests you would bring to the program.
- The name and email address of a teacher or supervisor who can recommend you for the REU.
The application website does not allow you to save and resume your application before submitting, so start the application when you have time to enter all the information.
STEP 1: Apply online no later than February 9, 2020. (application portal to open on December 18, 2019)
STEP 2: After submitting online application, please have the person recommending you for the REU upload a letter of recommendation (PDF or DOC) at the following link.
STEP 3: Notification of acceptance will be communicated between March 9 and April 10, 2020.
The 2020 REU dates are: Wednesday, May 27 to Saturday, August 1, 2020.
Students accepted into the REU will receive:
- On-campus housing
- Meal stipend
- A stipend of $6000 for other expenses and to take back home
- Up to $600 to help pay for travel to and from Rochester
Your experience will include:
- A programming bootcamp to help you learn or improve your programming skills in the language Python.
- Working with a team of students and faculty on one of the REU projects.
- Workshops on topics such as career planning and preparing for graduate school.
- Social events, including a trip to the Rochester International Jazz Festival.
If you have any questions about the REU or the application process that are not answered here, please send an email to email@example.com.
On the 2020 application form, you can specify your top project preferences from the list below. We will do our best to match you with one that matches your preferences and interests. You will be assigned to a project based on your background and skills.
2020 Projects (Planned)
Title: Creating Human-Computer Interactive Music Systems
In a world where the interaction between humans and machines is becoming deeper and broader, developing systems that allow us to collaborate with machines is one of the main missions in the research of cyber-human systems, robotics, and artificial intelligence. Creating human-computer interactive music systems is an excellent research direction as music performance is often highly collaborative. REU students in this project will participate in the design of several interactive music systems. In particular, they will explore novel deep neural network structures and ideas of reinforcement learning for training interactive music generation models. They will also integrate these models with other modules including music transcription and music structure analysis in system development. This project continues and expands the existing collaboration between Duan and Temperley on music transcription and music generation. It also continues and expands the collaboration between Duan and Brown on “automatic rendering of augmented events in immersive concerts” on the platform of TableTopOpera, a multimedia augmented music concert project at the Eastman School of Music.
Title: Concert Hall Acoustic Measurement and Simulation with Multiple Sound Sources on the Stage
Mentor: Ming-Lun Lee (Audio and Music Engineering, ECE)
Our 3D Audio Research Lab has recorded over 50 concerts in the past two and a half years. We have found that the sound fields captured with two ‘identical’ Neumann KU100 Binaural Microphones positioned only a few seats apart are significantly different. The comparisons with binaural recordings also conform with our actual hearing during the rehearsal sound checks. The goal of this research project is to measure impulse responses with binaural dummy head microphones and Ambisonic microphones, such as the 32-channel Eigenmike Microphone Array, in the concert halls at the Eastman School of Music. Instead of using one fixed loudspeaker at the center of the stage as a sound source to generate sine sweeps, we plan to move a speaker or speakers to multiple source positions on the stage. In this way, we may reproduce, hear, and analyze the spatial immersive sound of an orchestral performance by convoluting impulse responses with anechoic instrument recordings. We may also use the CATT-Acoustic software for concert hall acoustic modeling.
Title: Exploring Mental Responses to Listening and Imagination of Music
A variety of studies over the last two decades have shown that non-invasively obtained electrical signals through the scalp-recorded electroencephalogram (EEG) can be used as the basis for Brain-Computer Interfaces (BCIs). Major advances have been achieved in the analysis of brain responses to visual stimuli as well as of imagined motor movements, but little attention has been paid to human mental responses to or imagination of audio and music. In this project, REU students will examine connections between music and the mind directly by measuring brain signals through BCIs during music listening and imagination, as well as by developing computational methods for automatic analysis of such brain signals. One component of the project will involve the exploration of neural correlates of music listening, by analyzing the responses of the human brain to various aspects of the played music including melody, rhythm and timbre. Another component will involve exploring brain signal patterns generated in the process of music imagination or mental playing. The project brings together the three mentors’ complementary expertise: brain-computer interfaces (Cetin), music informatics (Duan), and biomedical instrumentation and time-series analysis (Anand), with the overarching connection being novel computational methods including machine learning.
Title: Audio-Visual Scene Understanding
Understanding scenes around us, i.e., recognizing objects, human actions and events, and inferring their relations, is a fundamental capability in human intelligence, and it requires a concert of multiple senses. Similarly, designing machine scene understanding algorithms is a fundamental problem in AI; it also requires analysis across multiple modalities (e.g., vision and audition). Existing machine scene understanding algorithms, however, are designed to rely on just a single modality. In this project, REU students will design algorithms that can jointly model audio and visual modalities toward audio-visual scene understanding. Specifically, they will contribute to three research thrusts: 1) audio-visual scene component recognition (e.g., object recognition and source separation), 2) audio-visual scene component relation analysis (e.g., spatial, temporal, correlative, compositional, and causal relations), and 3) audio-visual cross- modal generation (e.g., audio-driven talking face generation, visual-driven music generation). This project aligns perfectly with the existing collaboration between Xu (computer vision) and Duan (computer audition), with a goal of training and guide next-generation computer scientists to this emerging field.
Title: Education Technologies for Artificial Intelligence
Mentor: Zhen Bai (Computer Science)
There is an emerging presence of AI technologies in our everyday life from voice assistants such as Echo and Google home to smart life systems such as Fitbit and Spotify music suggestions. It is more and more important for K-12 students with little CS and math background to understand the fundamentals of how a machine thinks and behaves, in order to better interact and collaborate with our increasingly intelligent work and life environment. This project aims to design and develop playful and exploratory learning environment that support accessible AI literacy for PreK-12 learners. We are looking for students with a background or interest in technology-enhanced learning, data science, full-stack web development, and learning analytics. The students will take part in the research ideation, interface prototyping and evaluation, and learner behavior analysis of this iterative design research project.
Title: Human-AI Collaboration in Story Understanding
Mentor: Zhen Bai (Computer Science)
Intelligent agent has been gradually playing an indispensable role in our daily life. For example, Intelligent assistants like Alexa and Siri help us arrange schedules and book tickets conveniently. Although human-human collaboration has been well studied, human-AI collaboration still worth drawing much more attention. In this Human-AI collaboration project, we aim to investigate an effective framework where human is able to collaborate with an intelligent agent to carry out causal reasoning with social stories. We are looking for students with an interest in conversational agent, human-in-a-loop learning, applied machine learning and web development to join our project. The student will participate in research ideation, UI design, agent development, experimental design and evaluation of human’s behavior and machine performance during interaction.
Title: Extracting Acoustic Correlates of Perceived Social Characteristics of Speech
This project continues Hoque, Marvin and Kurumada’s collaborative project creating a system that automatically generates feedback to human users on their language use. It advances an existing dataset and crowd-enabled platform named ROC Speak for self-guided training in public speaking, where users video- and audio-record themselves and upload the recording to a crowd-based evaluation forum to receive feedback on the quality and effectiveness of the speech. In the current project, REU students will begin with surveying relevant literature with Marvin and Kurumada to identify target features, and then work with Hoque’s group in Computer Science to construct an algorithm that extracts information from data to predict human ratings. A goal is to create an automatic feedback system deployed in the context of audio- or video-based job interviews, in which perceived social traits are of importance.
Title: Style Transfer in Music Generation
The project will be to develop a computational music generation system that merges features from two musical styles. We will use a dataset of classical melodies and another dataset of rock melodies. The computational system will learn pitch patterns from one dataset and rhythmic patterns from another dataset, and will merge them to create melodies that combine the two styles. An additional project might be to incorporate harmonic information from the rock dataset, adding chord symbols to the generated melodies.
Title: Automatic Rendering of Augmented Events in Immersive Concerts
In immersive concerts, the audience’s music listening experience is often augmented with texts, images, lighting and sound effects, and other materials. Manual synchronization of these materials with the music performance in real time becomes more and more challenging as their number increases. In this project, we will design an automatic system that is able to follow the performance and control pre-coded augmented events in real time. This allows immersive concert experiences to scale with the complexity of the texts, images, lighting and sound effects. We will work with TableTopOpera at the Eastman School of Music on implementing and refining this system.
Title: 3D Audio Recording and Concert Hall Acoustic Measurement with Binaural and Ambisonic Microphones
Mentor: Ming-Lun Lee (Electrical and Computer Engineering)
In the past year, our 3D audio recording team has recorded over 35 concerts at the Eastman School of Music with several binaural dummy head microphones, binaural in-ear microphones, and Ambisonic soundfield microphones, including a 32-capsule Eigenmike and a Sennheiser Ambeo VR Mic. We have built a large database of 3D audio concert recordings for spatial audio research. This project plans to not only record summer concerts but also measure impulse responses in a concert hall with a variety of binaural and Ambisonic microphones. Our goal is to compare the results made with different microphones and explore the best method to measure and understand complex hall acoustics.
Title: Assessing the Effectiveness of a Speaker by Analyzing Prosody, Facial Expressions, and Gestures
How we say things convey a lot more information than what we say. Imagine the possibility of measuring the effectiveness of a speaker, or an oncologist delivering critical information to a patient or even measuring the severity of a patient with Parkinson’s by analyzing their prosody. This project will involve using knowledge from music to inform feature extractions, use machine learning to model them and then use cognitive models to explain the outcome.
Title: Augmenting Social-Communicative Behavior
Mentor: Zhen Bai (Computer Science)
Face-to-face interaction is the central part of human nature. Unfortunately, there are immense barriers for people with social-communicative difficulties, for example people with autism and people with hearing deficit, to engage in social activities. In this project, we seek design and technology innovation to create Augmented Reality (AR) technologies that facilitate social-communicative behaviors without interrupting the social norm of face-to-face interaction. We are looking for students with an interest in assistive technology, Augmented Reality, natural language processing and machine vision to take part in the design, interface prototyping, and evaluation of socially-aware AR environments that help people with special needs to navigate their everyday social life.
Title: Reading ancient manuscripts
This REU will develop a combined approach to reading damaged ancient manuscripts. Beginning with multispectral images, we will employ a combination of computer vision and natural language processing to fill in the holes in ancient texts written in Zapotec and Mixtec. The end goal will be to visualize the results with an AR/VR application.
Title: Education Technologies for Artificial Intelligence
Mentor: Zhen Bai (Computer Science)
There is an emerging presence of AI technologies in our everyday life from voice assistants such as Echo and Google home to smart life systems such as Fitbit and Spotify music suggestion. It becomes more and more important for people without an AI background to understand fundamentals of how a machine thinks and behaves, in order to better interact and collaborate with our increasingly intelligent work and life environment. We are looking for students with an interest in education technology, tangible user interface, and intelligent social agent to join our project. The students will take part in the design, interface prototyping and evaluation of physically and socially embodied education technologies that support K-12 AI education in formal and informal learning environments.
Audio-Visual Scene Understanding
Mentor: Chenliang Xu (Computer Science)
Evaluating the role of audio towards comprehensive video understanding - We are interested in measuring the role of audio plays in high-level video understanding tasks such as video captioning and spatiotemporal event localization. In this project, students will design novel Amazon Mechanical Turk interfaces to be used to collect audio-oriented annotations for tens of thousands YouTube videos. They will get hands on experiences on training deep learning algorithms to run on large-scale data with the focus on joint audio-visual modeling.
Assessing the Effectiveness of a Speaker by Analyzing Prosody, Facial Expressions, and Gestures
Mentor: Ehsan Hoque (Computer Science)
Assessing the severity of Parkinson's disease through the analysis of a voice test - This project involves the analysis of two vocal tasks from the Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) performed by people both with and without Parkinson's disease (PD). The tests include uttering a sentence and saying ‘uhh’ in front of the computer’s microphone. Our analysis will include extracting and identifying useful features from the audio recording and develop a novel machine learning technique to assess the severity level of Parkinson’s disease.
Computational Methods for Social Networks and Human Mobility
Mentor: Gourab Ghoshal (Physics and Astronomy)
Investigating Human Mobility in Virtual and Physical Space - The student will develop the data analysis skills required to investigate complex system data, including python coding and statistics. They will then apply these skills to study the unexpected similarities between human mobility in physical and virtual space.
Computational Methods for Audio-based Noninvasive Blood Pressure Estimation
Mentor: Zeljko Ignjatovic (Electrical and Computer Engineering)
Audio Based Non-invasive Blood Pressure Estimation - With cardiovascular disease as the leading cause of death in America, constant blood pressure measurement is imperative to detect early onset symptoms. Piezoelectric sensors can be used in conjunction with a recurrent neural network in a wearable device (such as a smartwatch) to extract pulse wave velocity data and heart rate data to estimate blood pressure. The concept further expands the use of machine learning techniques and applies it to activity trackers. Although related technologies exist in the field, none of these technologies use a recurrent neural network with a piezoelectric sensor, nor is any of the said technologies achieved the status of the standard in the industry, as the field is still in its infancy. Continued research is required to develop a smartwatch which can accurately detect blood pressure; however, enough pulse wave velocity, heart rate, and blood pressure data to teach the recurrent neural network and develop a working prototype sufficient for the end of the summer.
Music and the Processing Programming Language
Mentor: Sreepathi Pai (Computer Science)
A Framework for Developing Music-Generated Games (Erik Azzarano, Rochester Institute of Technology) - Erik is investigating a framework for developing music-generated games based on live or external audio input. He aims to create an intuitive mapping between a game’s mechanics and features of the audio input. For example, features of the audio such as frequency, amplitude, and beats, or onsets are extracted and mapped to different game parameters to drive the experience, such as when enemies spawn, their location, and how fast they move. The goal of this project is to have a finished framework with all of the appropriate mappings between game mechanics and audio features. The framework should allow the game to suitably portray any type of music or sound input.
Applying Recurrent Variational Autoencoders to Musical Style Transfer (Adriena Cribb, University of Pittsburgh) - Artistic style transfer refers to taking the style of one piece of art and applying it to another. While this problem has seen great progress in the image domain, it has been largely unexplored in the context of music. Adriena is building a single recurrent variational autoencoder that allows harmonic style to be transferred to any degree directly between two musical piece to ultimately produce deep learning methods for compositional style transfer and tools that allow musicians to explore novel modes of composition through the recombination of stylistic elements in different pieces of music.
Student presentations from summer 2017 can be viewed on YouTube.
Deep Learning of Musical Forms
Reverse-Engineering Recorded Music
Mentors: Professors Mark Bocko and Stephen Roessner (Electrical and Computer Engineering) and Darren Mueller (Eastman School of Music). Use signal processing algorithms to discover how the same recordings were remastered over time.
Web-based Interactive Music Transcription
Mentors: Professors Zhiyao Duan (Electrical and Computer Engineering and David Temperley (Eastman School of Music). Building an interactive music transcription system that allows a user and the machine to collectively transcribe a piano performance.
The Prosody and Body Language of Effective Public Speaking
Mentors: Professors Ehsan Hoque (Computer Science), Chigusa Kurumada (Brain and Cognitive Science), and Betsy Marvin (Eastman School of Music). Measuring the visual (e.g. smiling) and auditory features (e.g. speaking rate) that cause a speaker to be highly rated by listeners.
Synthesizing Musical Performances
Mentors: Professors Chenliang Xu (Computer Science), Jiebo Luo (Computer Science) and Zhiyao Duan (Electrical and Computer Engineering). Using deep generative learning to synthesize video of a musical performer from audio input.
Reading Ancient Manuscripts
|Nick Creel||Marlboro College|
|Matthew DeAngelo||Wheaton College|
|Daniel Dopp||University of Kentucky|
|Alexander Giacobbi||Gonzaga University|
|Allison Lam||Tufts University|
|Chase Mortensen||Utah State University|
|Jung Yun Oh||Rice University|
|Eric Segerstrom||Hudson Valley Community College|
|Spencer Thomas||Brandeis University|
|Katherine Weinschenk||University of Virginia|
|Erik Azzarano||Rochester Institute of Technology (RIT)|
|Alexander Berry||Middlebury College|
|Adriena Cribb||University of Pittsburgh|
|Nicole Gates||Wellesley College|
|Justin Goodman||University of Maryland - College Park|
|Kowe Kadoma||Florida Agricultural and Mechanical University|
|Shiva Lakshmanan||Cornell University|
|Connor Luckett||Austin College|
|Marc Moore||Mississippi State University|
|Michael Peyman||Mesa Community College|
|Jake Altabef||Renssaleaer Polytechnic Institute (RPI)|
|Harleigh Awner||Carnegie Mellon University|
|Moses Bug||Brandeis University|
|Ethan Cole||University of Michigan|
|Adrian Eldridge||University of Rochester|
|Arlen Fan||University of Rochester|
|Sarah Field||University of Rochester|
|Lauren Fowler||Mercer University|
|Graham Palmer||University of Michigan|
|Astha Singhal||University of Maryland|
|Wesley Smith||University of Edinburgh (UK)|
|Andrew Smith||University of Central Florida|