Building bridges to deeper machine learning

5 October 2021

Speeding up the performance of computers used in machine learning is important for many data-driven applications including advanced drone navigation, self-driving cars and credit card fraud detection.

This is the field that Bowen Kwan (Croucher Scholarship 2018) is focused on as a PhD student at the Centre for Doctoral Training in High Performance Embedded and Distributed Systems at Imperial College London.

It was Kwan’s father who triggered his early interest in technology by explaining the principles behind everyday gadgets and appliances. “If we entered an elevator, my dad would explain how it all worked. He started with the simple mechanics, then as I got older, he explained the electronics and the control algorithms in the software. Maths and puzzles were always part of family life,” Kwan said.

That early fascination with technology eventually led to Kwan’s study at Imperial College. Kwan designs software and hardware architecture for reconfigurable hardware circuits known as field-programmable gate arrays (FPGAs).

“For me it has always been digital system design which has been exciting, and I am most passionate about designing architecture and modifying existing algorithms to construct a bridge between hardware and software,” he said.

Deep learning models are generally memory and computationally intensive. Accelerating these operations helps to reduce energy consumption and allows these models to run on smaller devices.

“FPGA has been used in areas such as electronic trading and data centres, and with neural networks and machine learning. It can also be applied on time-series predictions, such as stock market price prediction and influenza trend prediction,” Kwan said.

Since dealing with tasks chronologically is time-consuming, Kwan modifies algorithms and architecture to enable multitasking. “FPGA excels in doing multiple simple computations at the same time,” he said.

FPGAs work particularly well with logic that can be both parallelised and pipelined. Parallelised logic refers to steps that can happen at the same time without being dependent on each other, while pipelined logic is a sequence of steps that can happen one after the other, and are not dependent on other steps in the sequence.

One of Kwan’s current projects explores efficient data augmentation for convolutional neural networks using permutation. It allows a simple data augmentation for training and inference by generating a restricted subset of local permutations. This allows multiple images to be processed, improving accuracy and efficiency in image classification and time-series prediction.

As part of Kwan’s PhD project, Kwan is applying this method to DeepVariant, which is a Google application that identifies genetic variants from next-generation DNA sequencing data.

DeepVariant detects genetic mutations from massive sets of DNA data and Kwan’s model speeds up the process. Ultimately, the ambition is for a patient’s DNA anomalies to be detected while waiting to see the clinician rather than after several weeks of data processing.

“I thought this application was really interesting so I also developed some library tools and a data augmenter,” he said.

FPGAs offer excellent reliability without requiring a high level of resources. In contrast to a traditional CPU, an FPGA can carry out various basic operations at the same time without competing for resources. Every task has an independent block allocated within the chip so that the task can be executed autonomously without affecting other blocks. This feature allows FPGAs to perform extremely well.

While FPGA allows for multiple tasks to be handled simultaneously, it is often hindered by the uncertainty of random memory access (RAM). However, some imperfections in memory are tolerated to improve efficiency.

Kwan also introduced a novel multiport memory capable of high memory bandwidth. The proposed architecture can be scaled up to 64 parallel read/write ports and beyond, which outperforms most existing designs.

“We break memory into smaller chunks to make multiple requests on those chunks – multiple requests to the same chunk of memory get bottlenecked in a queue so we store these until they can be processed,” he explained.

In the future, Kwan would like to explore further use of FPGA architecture for accelerating neural network training. Eventually, a product or process could undergo continuous training and will not be confined to a fixed phase of learning. Keep an eye on this research, as your autonomous taxi may in future be driven by FPGA.



Bowen Kwan Pok Yee received his MEng degree in Electrical and Electronic Engineering from Imperial College London in 2016, graduating with first class honours and winning the Governor’s Prize. In 2017, he joined the Department of Electronic Engineering, City University of Hong Kong, as a research associate, and returned to Imperial College London in 2018 to begin his PhD at the Engineering and Physical Sciences Research Council (EPSRC) Centre for Doctoral Training in High Performance Embedded and Distributed Systems (HiPEDS). He received a Croucher Scholarship in 2018.


To view Bowen’s Croucher profile, click here.