Past REU Sessions
With the Computational Methods for Understanding Music, Media, and Minds NSF REU at the University of Rochester, students explore exciting interdisciplinary research combining machine learning, audio engineering, music theory, and cognitive science. Previous research projects have investigated everything from computational methods for social networks and reading ancient manuscripts, to music encoding in the human brain. Students travel from institutions across the country to immerse themselves in summer research on the University of Rochester campus, and present their work to the public at the end-of-summer REU research symposium.
Title: Exploring Mental Responses to Listening and Imagination of Music
A variety of studies over the last two decades have shown that non-invasively obtained electrical signals through the scalp-recorded electroencephalogram (EEG) can be used as the basis for Brain-Computer Interfaces (BCIs). Major advances have been achieved in the analysis of brain responses to visual stimuli as well as of imagined motor movements, but little attention has been paid to human mental responses to or imagination of audio and music. In this project, REU studentswill examine connections between music and the mind directly by measuring brain signals through BCIs during music listening and imagination, as well as by developing computational methods for automatic analysis of such brain signals. One component of the project will involve the exploration of neural correlates of music listening, by analyzing the responses of the human brain to various aspects of the played music including melody, rhythm and timbre. Another component will involve exploring brain signal patterns generated in the process of music imagination or mental playing. The project brings together the three mentors’ complementary expertise: brain-computer interfaces (Cetin), music informatics (Duan), and biomedical instrumentation and time-series analysis (Anand), with the overarching connection being novel computational methods including machine learning.
Title: Music encoding in the human brain
Mentor: Ed Lalor (Biomedical Engineering, Neuroscience)
Music is one of the most emotive, universal, social and powerful things in human culture. But, despite being part of human life for hundreds of thousands of years, even defining what constitutes music is an issue that continues to be debated. One way to move forward on this issue would be to examine what constitutes music from the perspective of the human brain. However, how the brain creates coherent perceptions of music from complex combinations of sounds is itself poorly understood. One thing is clear: this process involves recognizing structure, and detecting meaning and associations from sounds that impinge upon our ears. In many ways, this is a similar challenge to processing speech. In this project, we aim to adapt recent progress on speech neuroscience to obtain a better understanding of how musical “meaning” and structure are computed by the human brain. In particular, we will use a combination of machine learning and the analysis of brainwave signals recorded from human subjects to identify neural correlates of musical structure and predictability. The student will analyze the structure and content of musical pieces and will analyze EEG data recorded from human subjects. They will also have the opportunity to learn how to collect this type of neural data.
Title: Sentiment Transfer in Music, Images, and Videos
Mentor: Prof. Jiebo Luo (Computer Science; ECE)
The project will aim to develop ground-breaking computational generation systems that can modify the perceived sentiment in multimedia content including images, videos, and music. Compared with other related tasks that have been well-studied, such as image-to-image translation and image style transfer, transferring the sentiment of an image is more challenging. In our preliminary work, we propose an effective and flexible framework that performs image sentiment transfer at the object level. It first detects the objects and extracts their pixel-level masks, and then performs object-level sentiment transfer guided by multiple reference images for the corresponding objects. More importantly, an effective content disentanglement loss cooperating with a content alignment step is applied to better disentangle the residual sentiment-related information of the input image. We intend to extend this approach to music and videos, including videos with music.
Title: Concert Hall Acoustic Measurement and Simulation with Multiple Sound Sources on the Stage
Mentor: Ming-Lun Lee (Audio and Music Engineering, ECE)
Our 3D Audio Research Lab has recorded over 50 concerts in the past two and a half years. We have found that the sound fields captured with two ‘identical’ Neumann KU100 Binaural Microphones positioned only a few seats apart are significantly different. The comparisons with binaural recordings also conform with our actual hearing during the rehearsal sound checks. The goal of this research project is to measure impulse responses with binaural dummy head microphones and Ambisonic microphones, such as the 32-channel Eigenmike Microphone Array, in the concert halls at the Eastman School of Music. Instead of using one fixed loudspeaker at the center of the stage as a sound source to generate sine sweeps, we plan to move a speaker or speakers to multiple source positions on the stage. In this way, we may reproduce, hear, and analyze the spatial immersive sound of an orchestral performance by convoluting impulse responses with anechoic instrument recordings. We may also use the CATT-Acoustic software for concert hall acoustic modeling.
Title: Modeling the semantic and acoustic trajectories of communicative interactions online
Mentor: Elise Piazza (Brain and Cognitive Science)
We are collecting data from people interacting online and are interested in characterizing the structure of their speech over the course of the interaction. REU students will survey the relevant literature regarding acoustic and semantic features of language, and then apply various natural language processing approaches (especially semantic models, such as word2vec) and auditory signal processing analyses (to extract vocal features, such as pitch, rhythm, and timbre). We are looking for students with a background in natural language processing and/or audio signal processing to tackle these analyses, and we will provide a theoretical background and methodological training in research on human communication in naturalistic settings.
Title: Style Transfer in Music Generation
The project will be to develop a computational music generation system that merges features from two musical styles. We will use a dataset of classical melodies and another dataset of rock melodies. The computational system will learn pitch patterns from one dataset and rhythmic patterns from another dataset, and will merge them to create melodies that combine the two styles. An additional project might be to incorporate harmonic information from the rock dataset, adding chord symbols to the generated melodies.
Title: Automatic Rendering of Augmented Events in Immersive Concerts
In immersive concerts, the audience’s music listening experience is often augmented with texts, images, lighting and sound effects, and other materials. Manual synchronization of these materials with the music performance in real time becomes more and more challenging as their number increases. In this project, we will design an automatic system that is able to follow the performance and control pre-coded augmented events in real time. This allows immersive concert experiences to scale with the complexity of the texts, images, lighting and sound effects. We will work with TableTopOpera at the Eastman School of Music on implementing and refining this system.
Title: 3D Audio Recording and Concert Hall Acoustic Measurement with Binaural and Ambisonic Microphones
Mentor: Ming-Lun Lee (Electrical and Computer Engineering)
In the past year, our 3D audio recording team has recorded over 35 concerts at the Eastman School of Music with several binaural dummy head microphones, binaural in-ear microphones, and Ambisonic soundfield microphones, including a 32-capsule Eigenmike and a Sennheiser Ambeo VR Mic. We have built a large database of 3D audio concert recordings for spatial audio research. This project plans to not only record summer concerts but also measure impulse responses in a concert hall with a variety of binaural and Ambisonic microphones. Our goal is to compare the results made with different microphones and explore the best method to measure and understand complex hall acoustics.
Title: Assessing the Effectiveness of a Speaker by Analyzing Prosody, Facial Expressions, and Gestures
How we say things convey a lot more information than what we say. Imagine the possibility of measuring the effectiveness of a speaker, or an oncologist delivering critical information to a patient or even measuring the severity of a patient with Parkinson’s by analyzing their prosody. This project will involve using knowledge from music to inform feature extractions, use machine learning to model them and then use cognitive models to explain the outcome.
Title: Augmenting Social-Communicative Behavior
Mentor: Zhen Bai (Computer Science)
Face-to-face interaction is the central part of human nature. Unfortunately, there are immense barriers for people with social-communicative difficulties, for example people with autism and people with hearing deficit, to engage in social activities. In this project, we seek design and technology innovation to create Augmented Reality (AR) technologies that facilitate social-communicative behaviors without interrupting the social norm of face-to-face interaction. We are looking for students with an interest in assistive technology, Augmented Reality, natural language processing and machine vision to take part in the design, interface prototyping, and evaluation of socially-aware AR environments that help people with special needs to navigate their everyday social life.
Title: Reading ancient manuscripts
This REU will develop a combined approach to reading damaged ancient manuscripts. Beginning with multispectral images, we will employ a combination of computer vision and natural language processing to fill in the holes in ancient texts written in Zapotec and Mixtec. The end goal will be to visualize the results with an AR/VR application.
Title: Education Technologies for Artificial Intelligence
Mentor: Zhen Bai (Computer Science)
There is an emerging presence of AI technologies in our everyday life from voice assistants such as Echo and Google home to smart life systems such as Fitbit and Spotify music suggestion. It becomes more and more important for people without an AI background to understand fundamentals of how a machine thinks and behaves, in order to better interact and collaborate with our increasingly intelligent work and life environment. We are looking for students with an interest in education technology, tangible user interface, and intelligent social agent to join our project. The students will take part in the design, interface prototyping and evaluation of physically and socially embodied education technologies that support K-12 AI education in formal and informal learning environments.
Audio-Visual Scene Understanding
Mentor: Chenliang Xu (Computer Science)
Evaluating the role of audio towards comprehensive video understanding - We are interested in measuring the role of audio plays in high-level video understanding tasks such as video captioning and spatiotemporal event localization. In this project, students will design novel Amazon Mechanical Turk interfaces to be used to collect audio-oriented annotations for tens of thousands YouTube videos. They will get hands on experiences on training deep learning algorithms to run on large-scale data with the focus on joint audio-visual modeling.
Assessing the Effectiveness of a Speaker by Analyzing Prosody, Facial Expressions, and Gestures
Mentor: Ehsan Hoque (Computer Science)
Assessing the severity of Parkinson's disease through the analysis of a voice test - This project involves the analysis of two vocal tasks from the Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) performed by people both with and without Parkinson's disease (PD). The tests include uttering a sentence and saying ‘uhh’ in front of the computer’s microphone. Our analysis will include extracting and identifying useful features from the audio recording and develop a novel machine learning technique to assess the severity level of Parkinson’s disease.
Computational Methods for Social Networks and Human Mobility
Mentor: Gourab Ghoshal (Physics and Astronomy)
Investigating Human Mobility in Virtual and Physical Space - The student will develop the data analysis skills required to investigate complex system data, including python coding and statistics. They will then apply these skills to study the unexpected similarities between human mobility in physical and virtual space.
Computational Methods for Audio-based Noninvasive Blood Pressure Estimation
Mentor: Zeljko Ignjatovic (Electrical and Computer Engineering)
Audio Based Non-invasive Blood Pressure Estimation - With cardiovascular disease as the leading cause of death in America, constant blood pressure measurement is imperative to detect early onset symptoms. Piezoelectric sensors can be used in conjunction with a recurrent neural network in a wearable device (such as a smartwatch) to extract pulse wave velocity data and heart rate data to estimate blood pressure. The concept further expands the use of machine learning techniques and applies it to activity trackers. Although related technologies exist in the field, none of these technologies use a recurrent neural network with a piezoelectric sensor, nor is any of the said technologies achieved the status of the standard in the industry, as the field is still in its infancy. Continued research is required to develop a smartwatch which can accurately detect blood pressure; however, enough pulse wave velocity, heart rate, and blood pressure data to teach the recurrent neural network and develop a working prototype sufficient for the end of the summer.
Music and the Processing Programming Language
Mentor: Sreepathi Pai (Computer Science)
A Framework for Developing Music-Generated Games (Erik Azzarano, Rochester Institute of Technology) - Erik is investigating a framework for developing music-generated games based on live or external audio input. He aims to create an intuitive mapping between a game’s mechanics and features of the audio input. For example, features of the audio such as frequency, amplitude, and beats, or onsets are extracted and mapped to different game parameters to drive the experience, such as when enemies spawn, their location, and how fast they move. The goal of this project is to have a finished framework with all of the appropriate mappings between game mechanics and audio features. The framework should allow the game to suitably portray any type of music or sound input.
Applying Recurrent Variational Autoencoders to Musical Style Transfer (Adriena Cribb, University of Pittsburgh) - Artistic style transfer refers to taking the style of one piece of art and applying it to another. While this problem has seen great progress in the image domain, it has been largely unexplored in the context of music. Adriena is building a single recurrent variational autoencoder that allows harmonic style to be transferred to any degree directly between two musical piece to ultimately produce deep learning methods for compositional style transfer and tools that allow musicians to explore novel modes of composition through the recombination of stylistic elements in different pieces of music.
Deep Learning of Musical Forms
Reverse-Engineering Recorded Music
Mentors: Professors Mark Bocko and Stephen Roessner (Electrical and Computer Engineering) and Darren Mueller (Eastman School of Music). Using signal processing algorithms to discover how the same recordings were remastered over time.
Web-based Interactive Music Transcription
Mentors: Professors Zhiyao Duan (Electrical and Computer Engineering and David Temperley (Eastman School of Music). Building an interactive music transcription system that allows a user and the machine to collectively transcribe a piano performance.
The Prosody and Body Language of Effective Public Speaking
Mentors: Professors Ehsan Hoque (Computer Science), Chigusa Kurumada (Brain and Cognitive Science), and Betsy Marvin (Eastman School of Music). Measuring the visual (e.g. smiling) and auditory features (e.g. speaking rate) that cause a speaker to be highly rated by listeners.
Synthesizing Musical Performances
Mentors: Professors Chenliang Xu (Computer Science), Jiebo Luo (Computer Science) and Zhiyao Duan (Electrical and Computer Engineering). Using deep generative learning to synthesize video of a musical performer from audio input.
Reading Ancient Manuscripts
|Anson Jones||Princeton University|
|Chanha Kim||Pomona College|
|Derek Lilienthal||California State University Monterey Bay|
|Jessica Luo||University of Rochester|
|Farrah Pierre-Louis||Simmons University|
|Miles Sigel||Rice University|
|Calli Smith||University of Connecticut|
|Rachael Tovar||Wheaton College|
|Julia Weinstock||University of Rochester|
|Michael Zhou||Cornell University|
|Nick Creel||Marlboro College|
|Matthew DeAngelo||Wheaton College|
|Daniel Dopp||University of Kentucky|
|Alexander Giacobbi||Gonzaga University|
|Allison Lam||Tufts University|
|Chase Mortensen||Utah State University|
|Jung Yun Oh||Rice University|
|Eric Segerstrom||Hudson Valley Community College|
|Spencer Thomas||Brandeis University|
|Katherine Weinschenk||University of Virginia|
|Erik Azzarano||Rochester Institute of Technology (RIT)|
|Alexander Berry||Middlebury College|
|Adriena Cribb||University of Pittsburgh|
|Nicole Gates||Wellesley College|
|Justin Goodman||University of Maryland - College Park|
|Kowe Kadoma||Florida Agricultural and Mechanical University|
|Shiva Lakshmanan||Cornell University|
|Connor Luckett||Austin College|
|Marc Moore||Mississippi State University|
|Michael Peyman||Mesa Community College|
|Jake Altabef||Renssaleaer Polytechnic Institute (RPI)|
|Harleigh Awner||Carnegie Mellon University|
|Moses Bug||Brandeis University|
|Ethan Cole||University of Michigan|
|Adrian Eldridge||University of Rochester|
|Arlen Fan||University of Rochester|
|Sarah Field||University of Rochester|
|Lauren Fowler||Mercer University|
|Graham Palmer||University of Michigan|
|Astha Singhal||University of Maryland|
|Wesley Smith||University of Edinburgh (UK)|
|Andrew Smith||University of Central Florida|