Replication infrastructure: Reliability in software

30 March 2016

Professor Heming Cui is currently an assistant professor in Computer Science of the University of Hong Kong. His research focuses on building software systems to improve the reliability and security of real world software. He has recently received the Croucher Foundation Innovation Award to support his research.

Online software services, including social networks, e-business services, and stock exchange platforms, have become increasingly pervasive and important. Unfortunately, the computers that hold these services can have software or hardware errors, which inevitably hurt the reliability of these services and cause severe disasters. 

For instance, computer errors in the New York Stock Exchange (NYSE) delayed the 2012 Facebook IPO resulting in the loss of tens of millions of U.S. dollars. If we have a replication infrastructure to automatically run multiple copies of NYSE’s stock services on different computers, we could tolerate such disasters, even if some copies encounter computer errors.

Structuring security

Cui’s research project focuses on building a general software replication infrastructure that improves the reliability of today’s online services. Large companies such as Facebook, Google, or Amazon often have the resources to develop software infrastructure for building a secure and reliable product tailored specifically to their needs. However, for non-technical and/or smaller companies, there is often a lack of the resources or expertise needed to build custom infrastructure for their software development.

To address this issue, one of Cui’s research goals is to develop a software infrastructure that is general and efficient, so that it can support a wide range of real world applications. 

“The main challenge is how to develop a general yet efficient system, as we cannot include special handling to deal with situations that are specific to any particular applications,” Cui said. 

To address this challenge, Cui proposes a new software protocol that can efficiently ensure the same general input types across different copies of the same service. Although work on this project started only a year ago, Cui and his collaborators have already built a preliminary working system, from which the results have been published in SOSP 2015, the world’s best software system conference. 

In addition, all the source code and evaluation results are available to the public for industrial deployments. Cui thinks that through building practical systems that can help general services tolerate computer errors, this project will have the potential to be applied broadly to tackle many reliability and security problems in real-world software.

Because of this work, Cui has received the Croucher Foundation Innovation Award for five million Hong Kong dollars in 2016. Cui is grateful for Croucher’s generosity, “Croucher’s Innovation award has helped the development of junior faculty greatly. Besides the financial support, it is also a great opportunity for networking and intellectual exchange, not only with people within my field but with top researchers from many different areas in Hong Kong.”

Professor Heming Cui is an assistant professor in Computer Science of the University of Hong Kong. His recent research has led to several U.S. patents, open source projects, and publications in premier systems software and programming languages conferences (e.g., SOSP, OSDI, PLDI, and ASPLOS). Cui’s previous systems have led to several new security errors detected in real-world software, and some of his systems been leveraged by worldwide researchers. Before joining HKU, he obtained his master and bachelor degrees from Computer Science of Tsinghua University in Beijing, and his PhD degree in Computer Science from Columbia University in New York.

