Recent NewsFebruary 21, 2020
‘Real world’ data science projects benefit sponsors, students
Companies, academic researchers, and non-profit organizations can find solutions for their data problems by sponsoring student practicum projects through the Goergen Institute for Data Science (GIDS) at the University of Rochester.
For example, the Wegmans Food Markets Inc. supermarket chain wanted to better anticipate which items it needs to stock up on when severe storms are forecasted in the areas served by its stores.
GIDS master’s students Kefu Zhu, Seda Ozturk, Ella Wan, and Zhou Xu gained experience in the various steps typical of an end-to-end data science project as they analyzed 117 million sales transactions at 100 Wegmans stores in six different states during a two-year period.
As part of the analysis, they:
- Pre-processed and aggregated the sales data by store, department, and category,
- Created a time series for each item, such as white bread, to visualize longitudinal trends.
- Applied differencing methods to remove seasonality.
- Imported weather data to correlate with the sales information.
- Grouped stores impacted by the same severe weather event to find regional variations in items with increased demand.
They found that, though customers in all regions are more likely to stock up in preparation for winter weather events, they are even more likely to do so in certain regions. They were also able to identify food and non-food categories people were most likely to stock up on in advance of severe weather.
For example, sales of sleep aids increased 183 percent in Virginia Wegmans stores in advance of significant thunderstorms.
Matthew Wallace, the business analytics administrator and architect at Wegmans was impressed when the students presented their findings to the company.
“You did an excellent job presenting and I heard many positive comments coming from the audience after you left,” he wrote. “You were articulate, well prepared and confident in your results and everyone learned more about the positives of data science. I believe this effort went a long way to building a partnership between our two organizations.”
‘Real world problems, real world data sets’
Capstone projects are an opportunity for data science students to apply what they’ve learned in the classroom to a “real world problem, working with real world data sets,” says Ajay Anand, the deputy director of GIDS who teaches the practicum courses.
Other companies and agencies sponsoring projects have included the New York State Attorney General’s Office; Origent Data Sciences, Brand Networks, Arable Labs, Flynn, and Visual DX. You can read more about those projects in our 2018 capstone article.
“The problems that a data science student can tackle range over a variety of application domains,” Anand says. “In the past we’ve had projects from health care, retail, and financial services.
“We particularly encourage sponsors to come forward with a problem that can be seen more as a backburner problem, i.e. not yet on a critical path,” Anand says. “This provides opportunities for students to come up with feasibility solutions that can in turn lend value to a company as it is developed further. Many times these are problems they don’t have to resources to devote to right away, but they need to get something off the ground.”
There is “absolutely no cost to the sponsoring organization,” he adds. “We see this as a win-win opportunity for both the students and the sponsors.”
Kevin Mille, Sr. Director Product Support at KLA, says the project his company sponsored with a GIDS student team was “an overall excellent experience. The students were very self-sufficient throughout the project.”
The project not only drove collaboration and innovation between his team and the students, but “reinforced the value of data science experts within my organization. Based on this experience we expanded our internship program and focused more on students with data science backgrounds when identifying candidates. I look forward to sponsoring another capstone!”
Academic projects also welcomed
Anand also welcomes projects from researchers and from administrative, academic and clinical departments at the University and neighboring institutions.
“I would love to get more of them excited and on board,” Anand says.
One capstone project, sponsored by Erika Ramsdale, an assistant professor of medicine (hematology/oncology) at the University of Rochester Medical Center, involved using new machine learning techniques to improve the ability to identify aging cancer patients at high risk of accidental falls, which is a major cause of death in that population.
Sixu Meng ‘19 says his team “used new machine learning techniques, including resampling; a feature selection process to process a geriatric assessment of clinical data; and constructed binary classifiers for fall prediction using different machine learning algorithms.
“We also reviewed the performance of different machine learning algorithms in the recent oncology literature and compared their pros and cons in clinical settings.”
Meng says the project was “truly a priceless experience for me, revealing the unparalleled efficiency and reliability of the data-driven approach in real-world settings. . . It was the first time that my work has been closely related to the well-being of people in clinical settings. I had never envisaged turning my knowledge and skill into something that brings such a tangible benefit to others.”
Ramsdale says the project was an “exceptional experience” for members of her team as well. “We found the team of (data science) students to be highly engaged, easy to work with, very responsive, and extremely capable and bright. They brought a new set of skills to our team, allowing us to substantially improve and accelerate our data cleaning processes, and they taught us new techniques for analysis of patient-reported outcomes data.”
Among the other “happy offshoots” of the collaboration:
Sixu Meng was hired to work part-time with Ramsdale’s team while completing his Take Five year at the University, and this year began working full time.
For more information about capstones or any other program offered by the Goergen Institute for Data Science, contact Ajay Anand.
What students say
“I learned so much from sharing and exchanging ideas with my teammates about different aspects of the analysis, such as differencing methods in time series, methods to extract weather data, methods to identify outliers, etc. Those real business problems I have not learned from the classroom or other classes.” – Ella Wan
“The best thing that we learned from this project was the experience of working on a real-world problem, using the skills and knowledge that we learned from school and applying those to solve problems. We not only needed to strive for better models and results, but also to keep business values in mind: what is the most cost-effective solution and can we reuse the model and make it transferable on other applications? We also practiced our communication skills, learning how to prepare different materials and report our progress in the most suitable ways for different groups of people. With these experiences, I believe each of our team members can now more comfortably dive into industrial settings after graduation and be better prepared in our data science careers.” – Zhou (Joe) Xu
“At the beginning, we may have unconsciously tried to explain the results based on our own purchasing experience with Wegmans. It turned out that some insights we found interesting were already known facts for Wegmans. But having active and regular communication with Katie (Snyder), Scott (Root) and Matt (Wallace) from Wegmans helped us a lot in explaining our results.” – Kefu Zhu