Chowdhury wins NSF CAREER award for making memory cheaper, more efficient in big data centers
Chowdhury connects all unused memory in a data cluster and treats it as a single unit.
Memory-intensive computing grows more and more important for practical and research applications alike – services that rely on low latency and use massive amounts of data, like cloud computing, big data, and artificial intelligence, suffer huge performance drops if they reach the limits of available memory.
To avoid these issues, modern data centers tend to overcompensate and install more memory than they need. What they end up with is memory sitting unused, memory fragmentation, and an increased cost overall.
A developing solution to this problem, called memory disaggregation, takes advantage of new technologies in Remote Direct Memory Access (RDMA) to connect all the unused memory in a data cluster and treat it as a single, massive memory unit. Prof. Mosharaf Chowdhury has published important findings in this area previously – together with Michigan collaborators, his group’s Infiniswap and DSLR solutions respectively tackled the problem of exposing remote memory to unmodified applications as well as granting many users concurrent access to a rack’s shared memory.
Now Chowdhury has been awarded an NSF CAREER grant to overcome new challenges in this area. His research project, “End-to-End Network Design for Unified Memory Disaggregation,” seeks to present all the unused, stranded memory on the different machines in a data center as a pool of available memory for applications that need more space to run. In the end, the team intends to make memory disaggregation practical.
Chowdhury’s new project will tackle several unsolved problems facing the technology at the host level, network level, and end-to-end. The key challenges include bridging the latency gaps between RDMA and local memory access; addressing network-wide fault-tolerance, load imbalance, and performance isolation issues; scaling with the size of data centers and number of applications being run; and coexisting with other infrastructures.
The immediate benefits, described in the group’s proposal, will include performance improvements for memory-intensive applications and higher datacenter memory utilization. These, in turn, will translate to cost-effective services for end users.
The outcomes of the work will be tested on a number of memory-intensive applications from key areas such as analytics, databases, deep learning, distributed caching, and graph processing using industry-standard benchmarks.
The team plans to release software and test harnesses that come from the project under flexible open-source licenses to enable other researchers and educators to explore the work, and will continue their ongoing work with cloud providers and hardware vendors to transition the findings into practice.
Software artifacts from Chowdhury’s research have been deployed in Microsoft and Facebook datacenters. He is also a co-creator of Apache Spark. Chowdhury has received the 2015 ACM SIGCOMM doctoral dissertation award, a Google faculty research award (2016), two Alibaba innovation research awards (2018), an NSDI best paper award (2012), a Facebook Fellowship (2012), and a Cheriton Scholarship (2009), and he had been nominated for an NSDI community award (2012) as well as a University of Waterloo alumni gold medal (2009).
About the NSF CAREER Award
The CAREER grant is one of the National Science Foundation’s most prestigious awards, conferred for “the early career-development activities of those teacher-scholars who most effectively integrate research and education within the context of the mission of their organization.”