Workshop on next-generation hardware for high-performance computing (NG-HPC 2018)

Next-gen hardware for performance and energy efficiency

Part of the Marionet UK Many-core Research Network


The program for this one day workshop is based on invited talks and there is no office call for papers or proceedings. We hope to be able to put the presentations online after the event as long as the speakers are happy with this. The program is ready but if you would like to contribute to it, please, let us know your ideas at


9:00 Welcome. Jose Nunez-Yanez. Bristol

9:15 Processing for Machine Intelligence. Carlo Luschi. Director of Research. Graphcore.

This talk will introduce the IPU (Intelligence Processing Unit), Graphcore’s ground-breaking processor based on massively parallel computing resources and large amounts of on-chip memory, with synchronised execution within an IPU or across multiple IPUs. The IPU architecture provides an unprecedented boost of training and inference performance over a wide range of machine learning algorithms. This is realized also by the adoption of efficient numerical representation and through algorithmic innovation, that allows to hold very complex models entirely on chip.

10-10:30 coffee

10:30 Inside Isambard — the World’s First Production ARM Supercomputer.

The Isambard Project, a GW4 Alliance initiative, recently disclosed the latest results from its benchmarking of Arm-based processors for HPC, the first such results for dual socket Cavium ThunderX2 nodes. This talk will discuss the result presented at the Cray User Group (CUG) Conference in Stockholm on May 23, 2018, detailing the performance comparison between Cavium™ ThunderX2® Arm®-based CPUs and the latest state-of-the-art Intel Skylake x86 processors. Results focused on the HPC codes that are most heavily used on the UK’s national supercomputer, ARCHER, and showed that for these kinds of workloads, ThunderX2 is competitive with the best x86 CPUs available today, but with a significant cost advantage.

11:15 It's not about the core, it’s about the system. Gajinder Panesar. CTO at UltraSoC

This talk will discuss UltraSoC’s semiconductor IP puts an intelligent analytics infrastructure into the hardware of an SoC, providing intimate visibility of real-world system behavior. that puts intelligent self-analytic capabilities in the SoCs at the heart of today’s consumer electronic, computing and communications products. UltraSoC embedded analytics technology helps solve pressing problems including cybersecurity, functional safety, and the management of complexity. UltraSoC solutions also allow designers to develop SoCs more quickly and cost-effectively. Benefits include robustness against malicious intrusions; enhanced product safety; reduced system power consumption; and better performance – with fine-tuning of end products even after they are deployed in the field.

12:00-13:00 lunch

13:00 Accelerating Machine Learning Using Intel FPGAs. Jahanzeb Ahmad. Vision Design Engineer at Intel Corporation

This talk willl discuss how Intel® FPGAs provide flexibility for artificial intelligence (AI) system architects searching for competitive deep learning accelerators that also support differentiating customization. The ability to tune the underlying hardware architecture, including variable data precision, and software-defined processing allows the FPGA to deploy state-of-the-art innovations as they emerge. Underlying application use include in-line image and data processing, front-end signal processing, network ingest, and I/O aggregation. The Intel® FPGA Deep Learning Acceleration Suite accesses Intel FPGAs for real-time AI by enabling a complete top-to-bottom customizable AI inference solution.

13:45 fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs. Christos Bouganis. Imperial College

In recent years, Convolutional Neural Networks (ConvNets) have become an enabling technology for a wide range of novel embedded Artificial Intelligence systems. Across the range of applications, the performance needs vary significantly, from high-throughput video surveillance to the very low-latency requirements of autonomous cars. In this context, FPGAs can provide a potential platform that can be optimally configured based on the different performance needs. However, the complexity of ConvNet models keeps increasing making their mapping to an FPGA device a challenging task. This work presents fpgaConvNet, an end-to-end framework for mapping ConvNets on FPGAs. The proposed framework employs an automated design methodology based on the Synchronous Dataflow (SDF) paradigm and defines a set of SDF transformations in order to efficiently explore the architectural design space.

14:30 GPU optimisation of gridding for Square Kilometre Array data. Anna Brown. Oxford e-research center

The w-projection gridding algorithm involves convolving a stream of data points with convolution kernels and gridding the result, and has applications in the Science Data Processor component of the Square Kilometre Array. The algorithm introduces significant memory bandwidth requirements in accessing the convolution kernels and output grid and has irregular, non deterministic memory access patterns driven by the distribution of the input data. This talk will cover optimisation of this algorithm for GPU, including presorting data to allow for additional parallelism without race conditions and localising data as much as possible to make use of higher bandwidth local memory.

15:15 Coffee and panel 

16:00 End