Funding

Wegmans Hall

We are no longer accepting proposals for the 2023-2024 Goergen Institute for Data Science (GIDS) seed funding program. Please check back in June 2024 for information on the 2024-25 program.

2023-2024 GIDS Seed Funded Projects

Estimation of Cancer-Relevant Gene Regulatory Networks From Perturb-Seq Data

PI: Matthew McCall

Genes do not act in isolation; rather they act together in complex networks that govern cellular function. Investigation of the interactions between genes (and gene products) is necessary to elucidate cellular mechanisms. Disease progression produces drastic changes in genetic networks critical to normal cellular function. Accurate models of gene regulatory networks (GRNs) present in healthy cells and how these GRNs are altered in disease have the potential to inform both prognosis and treatment.

Perturbation experiments, in which the expression of a given gene is experimentally altered, are the best approach to learning network structure. Alternative approaches based on observational data suffer from the large number of unmeasured technical and biological sources of variation that plague gene expression studies. Historically, large scale perturbation experiments were extremely time-consuming and often prohibitively expensive because each individual perturbation required infection with one or more specific viral constructs. However, recent technological advances, namely Perturb-seq, have enabled genome scale perturbation experiments by combining CRISPR interference (CRISPRi) and single cell RNA sequencing (scRNA-seq). The key advance of Perturb-seq is the simultaneous detection of the CRISPR single-guide RNAs (sgRNAs) that the cell receives and measurement of the transcriptome of that cell.

While inhibition of targetable oncoproteins has been shown to be an effective approach to cancer treatment, only a small fraction of cancers depend on targetable mutations. A promising alternative is the targeting of non-mutated proteins that represent crucial control points in the molecular circuitry of the cancer cell. We have previously developed and applied a novel GRN modeling algorithm, called TopNet, that pinpoints gene interactions that are crucial to the malignant phenotype. While our initial application of this methodology relied on low-throughput perturbation experiments, the recent development of Perturb-seq provides an opportunity to extend the TopNet algorithm to genome scale single-cell data. By linking genetic perturbation experiments with statistical and computational modeling, we can accelerate the discovery of genetic interactions in cancer-relevant gene networks.

The overall goal of the research is to develop and apply novel methodology to estimate GRNs based on thousands of gene perturbations at single cell resolution. We anticipate that the proposed methodology will improve estimation of cancer cell GRNs, identify aspects of the GRN that are crucial to malignancy, and quantify their variability across subpopulations of cancer cells. This will substantially improve the accuracy and validity of GRN modeling and facilitate the identification of target combinations for next-generation multi-drug cancer interventions.

Energy and Carbon Footprint Modeling of Visual Computing Systems

PI: Yuhao Zhu

As the design and fabrication of modern integrated circuit (IC) technology becomes ever more complex and expensive, the semiconductor industry is becoming increasingly compartmentalized, with companies or their subdivisions specializing in their own domain of expertise. Such a rigid separation between different system layers prevents ideas from easily percolating through the system stack. From a supply chain management and national security perspective, the shifting role of the U.S. in the vision sensor industry is growing even more worrisome. State-of-the-art sensor design and fabrication capability and talents are increasingly concentrated outside the U.S. None of the top five CMOS image sensor (CIS) vendors are U.S. based, with East Asian companies dominating 90% of the global market share and projected to have the fastest compound annual growth rate worldwide.

In this project, we aim to break the boundaries between different components in an end-to-end visual processing system and propose to model an end-to-end co-design framework that combines imagers, processors, and algorithms vertically across the system stack. Such modeling can enable significant system-level improvement of energy efficiency and imaging speed. If successful, the outcome will pave the way for sensor-driven Artificial Intelligence (AI) workloads over a broad range of applications such as AR/VR and autonomous driving.

The objectives of the project are:

  1. Develop and validate the modeling infrastructure for analyzing, simulating, and eventually synthesizing an analog/digital/algorithm co-designed system.
  2. Develop, for the first time, an energy and embodied carbon footprint model for intelligent visual processing systems.
  3. Design algorithms to automatically generate the algorithmic and hardware parameters.

Language Guided Audio Source Separation

PI: Zhiyao Duan

Humans are able to attend to a particular sound source (e.g., a friend’s voice) in a complex auditory scene (e.g., a cocktail party). Audio source separation research aims to equip machines with such abilities, i.e., separating sources of interest from an audio mixture. While deep learning-based methods have made significant improvements in terms of separation quality, the problem setup remains the same as decades ago: Sources to be separated need to come from a limited set of predefined categories (e.g., speech, violin, footsteps) on which the separation algorithms are trained. This is not how humans attend to sound sources. In fact, the definition of a sound source changes with the context. The key to achieve the flexibility of source definition in source separation is to have users to define sources of interest on the fly. A desirable way is through natural language, or text descriptions. Text descriptions can provide fine-grained information about various acoustic properties of sources of interest. They can also describe relations between the target source and other sound events.

In this project, we aim to develop a language guided audio source separation framework. This is a novel setup of audio source separation, as it goes beyond the limited and pre-defined source categories but allows users to define sources of interest with fine-grained text descriptions. We will address two main challenges. First, we will develop a robust text-audio linking model that can map between sound events and their corresponding text descriptions through a common embedding space. Second, we will develop a source separation algorithm that conditions the separation on a text description.

The intellectual merit of this project is twofold: 1) Allowing users to define sources of interest with natural language is a major shift on problem formulation of audio source separation. 2) The developed audio-text linking module connects audio and text modalities, allowing audio understanding tasks to benefit from significant advances in large language models (LLMs). As multimodal learning is becoming a main driving force of Artificial Intelligence research, there are increasing funding opportunities for research that connects audio and NLP. Our long-term goal is to investigate audio-language grounding problems and their novel applications. We plan to build on this project to collaborate with NLP researchers and apply for grants from the NSF CISE directorate.

 

Enhancement of Sensitivity and Yield in Integrated Optical Biosensing By Machine Learning

Co-PIs: Pablo A. Postigo, Benjamin L. Miller

Integrated photonics plays a crucial role in biosensing because it can manipulate and control light within tiny, chip-scale devices. It enables miniaturization, portability, and handheld devices for point-of-care applications. Enhances sensitivity, selectivity, multiplexing, and high throughput through simultaneous detection of multiple targets on a single chip, saving time and reducing sample volume requirements. It provides fast and real-time analysis and facilitates integration with microfluidics and electronics for fully integrated, multifunctional biosensing platforms. Nevertheless, some challenges need to be addressed to successfully market them: ensuring standardized fabrication processes for integrated photonic chips can be challenging. Variations in manufacturing techniques, materials, and designs can hinder scalability, leading to inconsistencies in chip performance and quality. achieving high sensitivity and specificity in biosensing to enhance detection limits, reduce false positives/negatives, and minimize cross-reactivity with interfering
substances. Rigorous validation and testing are essential to ensure the accuracy and reliability of the biosensing devices.

We can solve both problems completely or to a significant extent by applying machine learning techniques to process the optical signal and improve it amidst the surrounding noise. Our project aims to develop customized ML algorithms that will not only enhance the signal-to-noise ratio by at least one order of magnitude but also minimize the possible small inconsistencies in chip performance by automatic correction of the variations between devices.

 

Nature-Based Dynamical Systems for Energy-Efficient Neural Rendering

Co-PIs: Michael Huang, Tong Geng

Neural Radiance Fields (NeRF) has recently garnered considerable interest due to its unique prowess in facilitating novel view synthesis and cross-scene generalization. These properties serve as the bedrock for AR/VR applications. However, NeRF is still computationally intensive and order-of-magnitude improvements in energy are necessary for realistic adoption in AR/VR headsets.

Physical dynamical systems have also recently gained considerable interests. These systems can leverage natural properties (as embodied by their governing differential equations) to quickly evolve towards optimal phase points in the entire phase space. This evolution has the effect of solving certain optimization problems including those in machine learning. Dynamical systems are thus potentially ideal platforms for NeRF.

Having proposed one of the leading dynamical systems and pioneered its use to improve Graph Neural Networks (GNN), our team is uniquely positioned to explore this cross-layer design opportunity. Our near-term objective of this seed project is to conduct feasibility study and preliminary analysis that will help crystallize our concepts and attract external funding. Work proposed here aligns well with two of the GIDS
research priorities: foundations of ML/AI and ≠ AR/VR. Through sustained external funding this project potentially could attract, products and insights from this collaboration have strong potentials to reach the
market place relatively quickly.

Machine learning's most fundamental learning task arguably is learning half-spaces with a margin. Since the 1950s, it's well understood that such half-spaces with bounded margins are learnable through large-margin classifiers, like perceptron algorithm or support vector machines. Nevertheless, vital questions about this learning class remain unanswered, particularly surrounding the comparative efficiency of "partial" and "total" learning.

Investigating the Power of Partial Learning vs Total Learning

Co-PIs: Kaave Hosseini, Daniel Stefankovic

In partial learning, the learner can output an arbitrary placeholder where the answer is not crucial, while total learning necessitates the learner to consistently produce the correct binary label. The proposed research conjectures that although the class of half-spaces with margin which is (partially) learnable, no completion of it to a total binary concept class will be learnable. This project aims to delve into this separation, leveraging new techniques incorporating communication complexity theory, combinatorics, and the theory of pseudorandomness to address this vital aspect of half-spaces with margin.

This project could significantly contribute to our understanding of machine learning fundamentals. The comparison between partial and total learning addresses a previously unknown territory. This proposition could illuminate the genuine capabilities of partial learning in contrast to total learning.

The PI's recent publication on a binary concept class that illustrates a separation of partial and total (online) learning (as acknowledged in ICALP 2023 with the best paper award) further underscores the relevance and timeliness of this project. The PI’s previous construction witnessing a separation in online learning was a pathological concept class relying on sophisticated combinatorial constructions. However, addressing the problem in the context of PAC learning and/or large-margin classifier algorithms requires genuinely new ideas that are outside the current toolset of learning theory.

The outcome of this study could lead to a paradigm shift in the way machine learning methodologies are approached. Understanding the comparative efficacy of partial and total learning, could lead to more effective and efficient learning algorithms. Furthermore, this work will produce genuinely new lower-bound techniques for learning problems. On the other hand, the insights derived from the research could lead to better algorithms to find suitable feature spaces for learning tasks, and would influence real-world applications of large-margin classifiers, improving the predictive capabilities of these technologies across a range of industries, including healthcare, finance, and autonomous systems, among others.

How can we apply ideas from the crowdsourcing and social choice literature to improve the outputs of large language models and other generative AI technologies? The goal of the proposed research is to develop methods to make model output of (1) subjective content more representative and (2) objective content more accurate. We propose three high level directions that we will pursue: One direction leverages tools from the crowdsourcing literature to find more objective facts; the second uses tools from virtual democracy to audit alignment with diverse (and sometimes conflicting) fairness notions; and the last leverages traditional approaches from computational social choice to choose output that better aligns with pre-defined metrics of fairness.

Fairer and More Accurate Large Language Models via Social Choice and Crowdsourcing

Co-PIs: Anson Kahng, Nikhil Garg (Cornell Tech)

Creating avatars and aggregating their output: This approach leverages the fact that LLMs can be used to generate views from a diverse collection of people (which we call “avatars”), and these generated views can then be aggregated using a social choice mechanism. In particular, we will study how to choose a representative set of avatars, how to check if avatar opinions predict ground truth, and how to aggregate avatar opinions.

Auditing alignment with virtual democracy: In settings where people inevitably disagree about what constitutes “fairness” in the context of a machine learning model, we would like to be able to use an informed ethics board to judge the suitability of output. We propose leveraging the framework of “virtual democracy” in order to automate this ethical review process. We will focus on designing an implementable voting rule that aggregates (virtual) fairness judgments in a provably proportional way.

Aligning output with pre-defined metrics of fairness: In (relatively rare) settings where there is a universally agreed-upon metric of fairness, we study how to leverage tools from computational social choice to choose LLM output that most closely aligns with the fairness metric. Here, we focus on the online setting, where our goal is to design a no-regret mechanism (with respect to the fairness metric) for choosing LLM output.

2022-2023 GIDS Seed Funded Projects

Ising Boltzmann Substrate for Energy-Based Models

Co-PIs: Michael Huang, Gonzalo Mateos

The objective of this collaboration is the realization of a non-von Neumann computational system with the potential to excel in efficiency, capability, and applicability. Our ambitious and broad vision is to bring such a system to fruition, which can only be realized via convergent research in circuits and algorithms. A unique and particularly exciting component of the proposed collaboration is to demonstrate impact of nature-based computation beyond solving combinatorial optimization problems. Indeed, we propose potentially transformative advances in the co-design of computing hardware and machine learning (ML) algorithms for a class of energy-based latent variable models, which could benefit greatly from improved efficiency and speed. We will design and study a non-von Neumann platform that we call Ising-Boltzmann Substrate (IBS), which can accelerate both Ising-formula optimization and Boltzmann-machine style ML algorithms inspired by statistical physics, just like simulated annealing. Co-design of novel ML algorithms that leverage the unique features of our nature-based IBS presents both challenges and new opportunities to fundamentally re-examine (ML training and inference) algorithm design. Accordingly, work proposed here aligns well with the Goergen Institute for Data Science (GIDS) research priority in foundations of ML and artificial intelligence (AI). Through sustained external funding this seed program will help attract, products and insights from this collaboration have strong transformative potentials to bring nature-based computing to the state of compelling infrastructure and directly impact the gamut of application domains of ML in scientific discovery, industry, assistive technologies, robotics-aided healthcare, economic development, and consequent improvements in quality of life.

A Data-Driven, Virtual Reality-based Approach to Enhance Deficient Color Vision

Co-PIs: Yuhao Zhu, Gaurav Sharma

Approximately 8% of males and 0.5% females suffer from some sort of color vision deficiency (CVD), totaling 13 million in the U.S. and 350 million worldwide. People with CVD have restricted career options, limited driving rights in certain countries, and impaired abilities to interact and sense the world. In a 2020 survey, one third of the students with CVD indicated that color blindness affected their confidence in school, and 30% felt they might be a "slow learner" before finding out that they are color blind.

Existing methods to enhance colors for CVD have two fundamental limitations. First, these techniques are content-agnostic: the transformation is based only on individual pixel colors while ignoring content-specific information such as spatial/temporal frequency of the content and scene semantics (e.g., flower vs. people). Content-specific information is important for color enhancement: ideally, color confusion of salient features/objects should be minimized while color confusion for the background, unimportant content can be reasonably tolerated.

Second, these approaches provide a "one-off" enhancement: for a given scene, one, and only one, specific transformation is applied. Users are, thus, left with no control to customize the enhancement, which, however, is important to allow personalized color enhancement.

We propose to investigate a data-driven, learning-based system to enhance the visual experience for people with CVD. The system will be deployed as a Mixed Reality application, which intercepts the real-world scene captured by the camera, and dynamically enhances the pixel colors in the image based on user interactions before presenting the image to the users.

Using Virtual Reality technologies not only provides a new means to dynamically correct color for people with CVD, but, more importantly, empowers color-deficient people to interactively control how colors are corrected. Ultimately, our goal is to help people with CVD build the intuition of trichromatic color vision and thereby enable personalized, context-specific color enhancement for color-deficient people.

Audiovisual Integration in Virtual Reality Renderings of Real Physical Spaces

Co-PIs: Duje Tadin, Ming-Lun Lee, Michael Jarvis

Human brain optimizes perceptual behavior by integrating information from multiple sensory modalities. However, multisensory research is typically done with rather simple stimuli. Audiovisual (AV) integration, a focus of this project, is usually studied with flashes and beeps. This raises questions about the generalizability to real-world behavior where stimuli are considerably more complex and exhibit systematic congruencies.

We will address this knowledge gap by taking advantage of sophisticated visual and auditory space mapping techniques to capture real AV spaces and present these stimuli in virtual reality (VR). With VR, we canconduct much-needed naturalistic studies of human perception without giving up experimental control. This approach will allow us to test questions that are difficult if not impossible to study with simple stimuli. Specifically, we will test how congruency affects AV integration of rich visual and auditory stimuli. The overall hypothesis is that multisensory integration is optimized for situations where there is a high degree of congruency among complex sets of cues that follow natural sounds and sights.

Aim 1: To develop novel virtual environments for naturalistic AV experiments. With a dual goal of using naturalistic AV stimuli and retaining a high degree of experimental control, we will develop two VR-based AV testing environments. One will be based on Kodak Hall, a Rochester landmark, and the other on a carpeted hall/hallway. For both, we will perform a laser space scan to get an accurate (~2mm) 3D model and spatial audio recordings of impulse responses at a range of distance. These locations have been chosen to maximize AV differences. The proximal goal is to support AV experiments in our labs. We also plan to make the outcome of Aim 1 a freely shared resource for visual, auditory,and AV studies.

Aim 2: To test whether AV simultaneity perception scales with (a) AV distance cues and (b) congruency between auditory and visual signals. Work by us and others has shown that AV perception is sensitive to auditory distance cues, but how perception of AV simultaneity depends on the interplay among visual and auditory cues to distance is unknown. Moreover, visual and auditory distance cues depend on the environment. However, it is unknown how the congruency between visual and auditory cues affects AV perception. Here, we will use the experimental environments from Aim 1 to address these unanswered questions. This work will generate key pilot data for an NIH grant application planned for 2023.

Personalized Immersive Spatial Audio with Physics Informed Neural Field

Co-PIs: Zhiyao Duan, Mark Bocko

Spatial audio rendering is critical in providing a sense of space and object location to enhance the immersiveness in augmented reality (AR) and virtual reality (VR). It needs to be personalized because the geometry of humans’ auditory perception apertures (i.e., pinnae, head and upper torso) imposes great effects on the timing, intensity and frequency equalization to the sound we hear, and it varies significantly from one individual to another. Such effects are characterized by head-related transfer functions (HRTFs), which are used in spatial audio rendering in state-of-the-art AR/VR devices. However, HRTFs for each user vary with the relative location of the sound source to the listener, and a good measurement requires dense spatial sampling on a hemisphere around the listener. Therefore, the measurement of HRTFs is very time consuming and requires specialized environment and equipment. As such, HRTFs measured on dummy heads that represent an “average person” are used in practice, but the mismatch from the true HRTFs of a user significantly reduces the immersiveness and often causes localization confusion.

In this project, we propose to develop a physics-informed neural field approach to estimate HRTFs from the human physical geometry, which can be easily measured by a camera and off-the-shelf computer vision techniques. The proposed approach will learn a unified and differentiable representation for HRTFs across different datasets that use various spatial sampling schemes, and will train a deep neural network that integrates physical laws of sound scattering to predict this unified HRTF representation from ear mesh and anthropometric measurements of the human head. Compared to the well-established boundary element methods (BEM), the proposed approach relives the strong and often unreliable assumptions in the physical model and complements it with data-driven information through deep learning. Compared to existing deep learning methods, the proposed approach learns a neural field to unify the representations of HRTFs across datasets with different spatial sampling schemes, significantly enlarging the training and test datasets. In addition, the integration of physical laws into the deep learning framework would significantly improve the model interpretability and relieve the requirement of a large amount of training data.

The project will be performed by a team of two PIs and their two PhD students, who have completed some preliminary work through an unfunded student practicum course project. The PIs will use the research results of this project to submit an NSF proposal to the directorates of CISE or Engineering. Both PIs will also integrate research findings into their courses at the AME program and the AR/VR NRT program, and recruit undergraduate researchers to assist the PhD students.

Computational Earth Imaging with Machine Learning

Co-PIs: Tolulope Olugboji, Mujdat Cetin

With this project, we aim to support and expand our research efforts in computational Earth imaging in a new direction by developing advanced machine learning (ML) based methodologies. This is a collaborative project that will bridge research across two groups at the University of Rochester, one in Earth & Environmental Sciences (EES) and one in Electrical & Computer Engineering (ECE).The lead PI is an early career faculty (Dr. Olugboji, EES) with recent NSF funding (#2102495) to produce high-resolution images of Africa’s subsurface architecture. Dr. Olugboji will partner with Dr. Cetin (ECE) on this topic of mutual interest with great potential for external funding from NSF and other agencies. In particular, Dr. Cetin’s work on probabilistic and machine learning based methods for computational imaging is of critical value for the proposed research. The overarching research theme of the collaboration is: how can we apply probabilistic machine learningfor robust and efficient subsurface image reconstruction using uncertain and complex ground vibration data? In this project, we elaborate how this has the potential to expand our ongoing research efforts significantly and lead to new grant proposals.

Africa’s subsurface maps have now been generated using an involved computation that constructs maps and associated uncertainties using a probabilistic inverse modeling approach. Probabilistic inverse modeling solves an adaptive planetary imaging problem for a spatial velocity field using ground vibration data as constraint. This class of problems is analogous to medical imaging and computer vision. One challenge is computational complexity. An exhaustive, multi-parameter search in a bid to obtain the best image as well as constrain its uncertainties is computationally demanding. For example, the computational task for one map takes 3-4 months using large allocations on the Bluehive HPC system. Another challenge is the ill-posed nature of the imaging problem, necessitating the use of prior information or constraints. Finally, there is a need for efficiently characterizing the uncertainty in the solutions obtained. Incorporation of state-of-the-art probabilistic machine learning methods can help address these issues. In particular, we aim to develop new computational imaging methods for Earth imaging that involve deep learning architectures to: (1) speed up and improve the imaging process, (2) quantify all forms of uncertainties (measurement and modeling) efficiently, and (3) compactly represent the results of large simulations for efficient dissemination. The resulting methods will lead to publications and serve as preliminary work for external proposals.

Improving deconvolution estimates through Bayesian shrinkage

PI: Matthew McCall

The quantification of RNA transcripts within cells allows us to gain insight into cellular function. Tissues are heterogenous mixtures of component cell-types, thus the total RNA in a sample is the sum of RNA contributed by each cell-type present. Many genes are known to be expressed by distinct cell-types, and this specificity of expression can confound functional differences between phenotypes in differential expression analyses. Deconvolution techniques allow researchers to estimate the abundance of each cell type assumed to be in a tissue. While deconvolution is a useful tool to estimate composition, methods have been shown to be temperamental and several crucial considerations must be made when setting up and employing such a workflow in an analysis. We propose a shrinkage procedure based on empirical Bayes techniques to improve deconvolution estimates to reduce variance in composition estimates obtained from deconvolution methods.

Building a Multi-Step Commonsense Reasoning System for Story Understanding

Co-PIs: Zhen Bai, Lenhart Schubert

People know that a wolf is dangerous, but no one will run away if it’s just a toy. Such commonsense reasoning is easy for a 5-year-old child to make, yet it remains challenging for most of today’s AI systems. Why can’t we teach AI human-level common sense like how we teach our children during bedtime story reading? In this project, we propose to leverage recent advancements in deep learning, namely the introduction of large language models, and symbolic reasoning techniques in creating a novel multi-step commonsense reasoning system for story understanding. We aim to produce preliminary results to seek external grants such as NSF CISE IIS: Human-Centered Computing (HCC) and Robust Intelligence (RI).

The project includes two main research activities: 1) creating a context-sensitive multi-step commonsense reasoner based on the ROCStories Corpus and the GLUCOSE Dataset — a collection of generalized contextualized commonsense knowledge about everyday situations; 2) conducting crowdsourcing-based survey studies to evaluate the rationality of AI-generated multi-step commonsense reasoning paths along theoretically justified quality dimensions, and collect a dataset of such reasoning examples along with human modifications done in structured English.

The project will integrate PI Bai’s expertise in human-agent interaction, interactive storytelling, and research on technology-supported theory of mind in general, and PI Schubert’s extensive experience in knowledge representation, commonsense reasoning, story understanding, and dialog agent. The complementary expertise contributes to synergistic innovation in human-centered AI and Human-AI collaboration.

We expect the project’s findings to contribute to the broader academic community in three ways: First, we will present a novel system for generating multi-step commonsense reasoning that is context-aware, a major challenge for commonsense reasoning. Second, we will extend our understanding of a new elicitation mechanism for acquiring commonsense knowledge through human feedback on machine-generated commonsense reasoning. Third, we will gather a collection of human-evaluated-and-modified multi-step commonsense reasoning examples contextualized to stories. This novel and contextualized dataset will jumpstart our future investigation into active learning for AI commonsense reasoning through human feedback.

Versatile and Customizable Virtual Patients to Improve Doctor-Patient Communication

Co-PIs: Ehsan Hoque, Ronald Epstein

The doctor-patient relationship is one of the most moving and meaningful experiences shared by human beings. It is built on trust and vulnerability, and effective doctor-patient communication is essential to creating a therapeutic relationship. Many doctors tend to overestimate their ability to communicate effectively, but strong communication skills remain necessary to gather accurate information, provide appropriate counsel, and establish caring relationships with patients. Despite the importance, practicing with actual patients remains the only option for many doctors to improve their skillset.

This proposal builds on your prior work of using realistic virtual patients to help doctors practice their communication skills. We propose to leverage the latest advancement in natural language generation (GPT-3), game development, neural speech synthesis, etc. to develop an all-size-fits-all virtual patient that would be able to carry on versatile conversations with doctors in a human-like manner with minimal training. Our approach will allow doctors to choose the patient's demographic, age, gender, ethnicity, appearance, medical condition, medical history, and personality, and create unique narratives, and contexts for each conversation. Communicating with realistic virtual patients in such diverse contexts would allow them a safe environment to experiment with their communication skills, empathy, and creativity, without the worry of causing harm to a patient or providing poor care. At the end of the conversation, the doctors will receive automated feedback on their behavior in accordance with the established medical best practices from an analysis of the conversation history. This would, in turn, lead to better doctor-patient communication in real-life situations, and ultimately, to better patient outcomes. If successful, our approach could potentially impact the field of “Narrative Medicine.”

We will validate the system by recruiting 20 practicing primary care physicians from URMC who actively work with patients with opioids. Our long-time collaborator, Ronald Epstein from URMC can help with recruitment (a letter of support wasn’t attached due to a last-minute glitch, but could be furnished if needed). Our qualitative experiment will explore the following questions: 1) How effective is the system in helping doctors improve their interpersonal skills? 2) What type of “challenges” help doctors best improve?

Machine Learning Assisted Femtosecond Laser Fabrication of Efficient Solar Absorbers

Co-PIs: Chunlei Guo, Jiebo Luo

Solar energy is a major source of renewable energy that can decrease carbon emission. To effectively harvest solar energy, solar-thermal energy devices have been widely used, where the key element is a selective solar absorber (SSA). In the past, the Guo lab has performed a series of studies using femtosecond lasers to produce SSA on different metal substrates, where we maximized the SSA’s absorption in the visible spectral range while minimizing the emission in the blackbody (BB) spectral radiation to harness the maximum amount of solar energy while minimizing the heat loss. However, optimizing the fabrication parameters can be time consuming. The Luo lab has extensive experience in applying machine learning in various applications. To minimize the experimental optimization process, the Guo Lab will collaborate with the Luo Lab to use machine learning algorithms to predict the suitable fabrication parameters for producing SSA. In the first step, we will train a neural network with adequate experimental data by fabricating SSAs using a series of parameters. Next, the genetic algorithm will be integrated to the trained neural network to optimize the fabrication parameters. Moreover, we will explore the use of generative adversarial networks to build an end-to-end trainable model. Ultimately, we will obtain an optimal set of fabrication parameters for SSAs to harvest the maximum solar energy, which will be characterized by SSA’s steady state temperature under a certain amount of solar radiation. The seed funding should provide a crucial support for the team to be more prepared for applying for larger funding opportunities from agencies such as NSF, DOE, and DOD.

Rhythm-Aware and Emotion-Aware Video Background Music Generation

PI: Jiebo Luo

With the popularity of video editing tools and mobile apps, users can conveniently edit short videos (less than a few minutes long) and share them online on social media apps. However, attracting online users and customers not only requires good video editing and visual effects but also requires appropriate and relevant background music. Video editing tools are simple enough for any user to add visual effects, but producing appropriate songs requires expertise in music. Searching for appropriate background music and modifying it is still very challenging and time-consuming. In addition to these problems, users must be cautious regarding copyright concerns. Automatically generating background music for videos would be very beneficial to video content creators and video editors.

Artificial intelligence provides a possible avenue to address this problem. The goal of AI is to “train” a machine to be able to accurately perform and “learn” tasks using computer algorithms, potentially better than a human user could. Therefore, the aim of this study is to develop a machine learning algorithm that can learn how to generate background music for a given input video. There are two primary aspects when generating music that is appropriate for a video - rhythm, emotion (or mood), and the instruments. The video-music rhythmic consistency is critical to ensure that the pace of the audio matches the video. For instance, one would expect a slow and melodious tune for a romantic film, and an intense and ominous score for a horror video. Apart from the rhythm, it is essential to transfer the arousal and valence conveyed by the video to the generated music. Arousal is the level of autonomic activation that an event creates, and ranges from calm to excited. Valence, on the other hand, is the level of pleasantness that an event generates and is defined along a continuum from negative to positive emotions. Finally, the AI algorithm will also be controllable and have the ability to generate music based on instruments that are specified by a user.

The successful development of the described algorithm can be used to generate background music for videos that mimic its rhythm and mood, and can be controlled by user-specified instruments. Upon successful completion of this project, we hope to design a system that can generate music and corresponding lyrics for a given input video.

2021-2022 GIDS Seed Funded Projects

Physics-aware Learning-based Ultrasound Tumor Ablation Monitoring

Co-PIs: Ajay Anand, Mujdat Cetin, Diane Dalecki

The objective of this project is to develop physics-guided learning-based medical ultrasound imaging solutions for monitoring and guidance of tumor ablations. Tumor ablation—commonly performed by heating tissue to cytotoxic levels for a few minutes using RF, laser, and high intensity focused ultrasound (HIFU)— is an established minimally invasive technique for the treatment of tumors of the liver, kidney, prostate, bone, and lungs. The measurement of temperature (thermometry) during thermal therapy is an attractive means of mapping the region of thermal damage. While MRI imaging has been shown to be effective for noninvasive thermometry due to its superior accuracy, spatial and temporal resolution, it has several disadvantages: it is expensive, not portable, and not directly compatible with custom therapy systems to work within the strong magnetic field of the scanner. In contrast, image-guidance with ultrasound remains particularly attractive in simplicity, portability, accessibility, and cost. Ultrasound-based thermometry has been previously proposed as a means of therapy monitoring but existing methods suffer significant limitations impeding clinical usability. Existing methods of ultrasound thermometry are ineffective beyond 50°C due to multiple physical limitations: non-monotonic relationship between temperature and sound speed including plateauing around 60°C, tissue phase transitions and deformation, stochastic generation of cavitation bubbles, and tissue necrosis. Hence, an ultrasound-based technology that can successfully monitor treatment over the entire therapeutic temperature range is highly desirable clinically. Towards this end, the proposed research uses a hybrid learning-based approach to combine real-time ultrasound thermometry data measured at the periphery of the heating zone (which is thus in the favorable sub-ablative temperature range up to 50°C) with the underlying heat transfer process (implemented via the diffusion equation) to infer real-time temperature maps throughout the treatment zone. The work will also explore tradeoffs between approaches which incorporate more or less information from the physical models. Incorporating a learning-based approach in the inversion process offers significant advantages over relying solely on inline patient-specific finite element-based models which is cumbersome and computationally inefficient to use in the clinical setting.

The project brings together a multidisciplinary team of faculty with expertise in computational imaging, data science, biomedical engineering, and ultrasound technology to develop the learning-based therapy monitoring approach and evaluate it in ex-vivo experimental settings. Successful technical feasibility, facilitated by this seed funding, holds promise for translational research collaborations with URMC clinicians and leading biomedical research groups around the country to pursue long-term external federal grant funding opportunities in image-guided surgery.

Automatic Rendering of Augmented Effects in Immersive Concerts

Co-PIs: Zhiyao Duan, Matthew Brown, Raffaella Borasi

The project will develop a computational system to automate the coordination between music and other multimedia during a live performance to provide audiences with a more immersive experience. As such, the project will not only address a specific need identified by a current NSF-funded Future of Work planning grant, but more importantly serve as a pilot study to provide “proof of concept” for work at the core of a planned $2.5M Future of Work research grant proposal in March 2022.

Our preliminary exploration of the “pain points” experienced by artist-technologists (i.e., individuals working at the intersection of arts and technology) revealed the need for new and more user-friendly technical tools to enable musicians to better leverage advanced technology. This is especially true for immersive concerts, where the music listening is augmented by other multimedia such as text, lighting, animations, smoke and sound effects, using AR/VR technologies, to provide the audience with a more impactful and engaging experience. The key to creating successful immersive experiences is to coordinate the music precisely with other multimedia during a live performance, yet currently there are no good solutions to make this happen smoothly.

The proposed system will follow musical scores in real time and automatically trigger multimedia events that have been annotated on the musical score, thus replacing or greatly reducing the workload of conductors and operators and making immersive events easier to schedule and present. Developing this new system will require adapting score following algorithms to immersive concerts so they can cope with a wide range of music ensembles, background noises, as well as acoustic echo of augmented sound events - which in turn will improve the robustness of state-of-the-art score following algorithms. For the rendering of immersive events, the proposed system will connect the score following algorithm with a widely used multimedia content management software named QLab through the Open Sound Control (OSC) protocol.

The new system will build on preliminary work by PI Duan, who already developed algorithms for score following and employed such algorithms in automatic lyric display applications for choral concerts. We will develop and test the system with TableTopOpera, a chamber ensemble from the Eastman School of Music that specializes in multimedia projects. The project will include a rigorous evaluation of these implementations, including interviews with TableTopOpera musicians to derive broader implications for musicians and technicians working to produce immersive music experiences.

Artificial Intelligence for effective communication on health effects of electronic cigarettes through Instagram

Co-PIs: Dongmei Li, Chenliang Xu

Social media platforms such as Twitter, Instagram, and YouTube are trendy in the United States, especially among youth and young adults. Previous studies have found that social media platforms are widely used to promote electronic cigarette (e-cigarette) products by vape shops and companies. However, they are under-used by public health authorities for educating the community about the health risks of e-cigarette use (vaping). Social media marketing of e-cigarette as healthier alternatives to conventional cigarettes resulted in youths' common perception that vaping is a harmless activity. The National Youth Tobacco Survey showed that e-cigarette use among high school students has skyrocketed from 12% in 2017 to 28% in 2019. Thus, it is urgent to communicate effectively with the public about the risks of e-cigarette use.

The purpose of this project is to identify potentially effective ways of communicating with the public about the health risks of electronic cigarette use on the most popular social media platform in youth, i.e., Instagram. The key to our approach is applying cutting-edge artificial intelligence and statistical learning techniques to ease the epidemic of e-cigarette use. To achieve our objectives, we will characterize essential features of Instagram images educating/warning of e-cigarette use risks associated with high social media engagement through advanced deep-learning algorithms and statistical learning methods. Instagram images have been widely used by the vaping industry and vaping proponents to attract Instagram users and promote the sale of e-cigarettes. Using deep learning techniques (such as convolutional neural networks), we will identify the most crucial image features associated with high user engagement (number of likes) to educate/warn the public about the health risks associated with vaping. Such information can guide us to design useful images that convey the health risks of e-cigarette use to the public. This project will provide much-needed information to the Center for Tobacco Products (CTP) on understanding how to effectively communicate with the public regarding the potential health effects of e-cigarette use through Instagram. Moreover, it will guide future campaigns of CTP to address the current epidemic of vaping, particularly among youth and young adults, to protect public health.

Designing Effective Intervention to Promote Green Products in Online Shopping Platforms

PI: Ehsan Hoque

Although the increasingly devastating effects of climate change have drawn global attention, it is still difficult to motivate people to take action against climate change. In fact, around two-thirds of global greenhouse gas emissions can be attributed to household consumption [1]. Despite individuals being concerned about the environment and willing to opt for greener consumption, these intentions are often not translated into appropriate actions due to a lack of incentive for going green [2]. High price and difficulty of identifying green products, not having enough time for research, lack of environmental information in the product description, and lack of trust in the eco-friendliness labels provided by the manufacturers have been identified as the major barriers for the gap between consumer attitude and their behavior [3]. However, hardly any solutions have been proposed by current literature on how to overcome these barriers.

Online shopping has been gaining massive popularity - in 2020, the worldwide e-retail sale was more than 4.2 trillion USD with over two billion e-commerce customers [4]. We identify that e-commerce platforms can play a significant role in tackling climate change. We propose a redesign of existing online shopping platforms by introducing the addition of eco-friendliness ratings (how eco-friendly a product is), environmental impact summary, and highlighting eco-indicator keywords indicative of environmental impact. The eco-friendliness rating of a product would enable users to identify greener products quickly and conveniently as climate-conscious consumers can sort relevant products based on their eco-friendliness. The environmental impact summary briefly explains the impact of a product on the environment, and eco-indicator keywords are the words/phrases in a product description that can be related to the environmental impact of the product. These explanations and highlights can justify the eco-friendliness rating provided, and thus increase consumer trust. In addition, they can work as a continuous reminder of how one's buying choices can make a difference towards a greener earth.

Our hypothesis is that the proposed components, if introduced in existing e-commerce platforms, will significantly reduce the “attitude-action gap”, by addressing many of the major barriers identified including consumer’s inconvenience, lack of knowledge, and lack of trust. Since billions of people shop online, motivating even a smaller percentage of the consumers can make a massive contribution towards tackling climate change. We aim to design a prototype of the proposed e-commerce platform and run a quasi-randomized case-control study to investigate whether the prototype can significantly influence individual consumption behavior.

Interactive Climate Network Exploration over Real-Time Data

Co-PIs: Fatemeh Nargesian, Gourab Ghoshal

To identify and analyze patterns in global climate, scientists and climate risk analysts model climate data as complex networks (networks with non-trivial topological properties). The climate network architecture represents the global climate system by a set of anomaly time-series (departure from the usual behavior) of gridded climate data and their interactions. Several studies have applied network science on climate data assuming dynamic networks. To study the stability of a network over time, scientists compare the similarity between networks in different years with the patterns of daily data. For example, it is found that networks constructed from temperature measurements on different sites in the world are changed dramatically during El-Nino events in a similar way.

Intellectual Merit: The common way for network dynamics analysis is to construct networks for each hypothesized time window and analyze them separately, which is a laborious task for data exploration and becomes impractical on real-time data. To bridge the gap between climate data and network analysis, we propose to build a data platform that enables climate scientists and decision-makers to select/filter data and efficiently and interactively construct and process climate networks on historical and real-time data. The key requirements of this platform are: performance (low latency and overhead), data and analytics integration (data access is seamlessly integrated into network analysis algorithms), and query workload (supporting necessary data serving building blocks for climate network analysis). The main objectives included: 1) real-time network construction on various time-windows and resolutions to meet selection and visualization needs of users, 2) change detection in the topology of a network as the underlying time-series change, and 3) real-time clustering of nodes and community detection in a network at user-specified time-windows.

Expected outcomes of this project include a suite of algorithms for efficiently sketching and analyzing massive and frequently updated time series to enable climate network analytics (particularly real-time network construction, clustering, and community detection); the software library of a light-weight data layer on top of existing open-source streaming engines (such as Trill) that bridges the storage and analytics layers by implementing the building blocks of climate data processing (sketching and correlation calculation on uni- and multi-variant time series); a climate network analytics dashboard for visualization and analysis of real-time data.