Current projects
Research Projects
Projects supported by industry partners
Towards Optimal Next-Generation Heterogeneous Interconnects for High-Performance Irregular Workloads [Prof. Torsten Hoefler]
In this project, we aim at developing next-generation massively-parallel and heterogeneous interconnects, as well as efficient algorithms and paradigms for solving challenging classes of unstructured irregular workloads such as graph computations.
Key achievements:
- Design of novel paradigms & architectures for scalable and high-accuracy irregular AI applications: graph of thoughts (published in AAAI'24), HOT (published in LoG'23), attentional graph neural networks with global tensor formulations (published in Supercomputing’23), cached operator reordering,
- Design of novel paradigms & architectures for scalable irregular graph applications: the graph database interface (published in Supercomputing’23, Best Paper Finalist), ProbGraph (Best paper award at Supercomputing'22), Neural Graph Databases (published in LoG’22), and harnessing Graph Neural Networks for motif prediction (published in KDD’22).
- Design of scalable interconnects: Sparse Hamming Graph, which is a customizable network-on-chip topology (published in DAC'23) and HexaMesh, a topology that enables scaling to hundreds of chiplets with an optimized chiplet arrangement (published in DAC'23).
- Analysis and taxonomy of data organization, system designs, and graph queries (published in ACM Computing Surveys (CSUR)) and an in-depth concurrency analysis of parallel and distributed graph neural networks (published in IEEE TPAMI).
Contact
chevron_right Matciej BestaCross-layer Hardware/Software Techniques to Enable Powerful Computation and Memory Optimizations [Prof. Onur Mutlu]

Cross-layer techniques provide expressive interfaces to transfer semantic information about applications, in order to improve performance, energy efficiency, security, QoS, and many more properties of computing systems.
Key achievements:
- Leveraging underutilized cache resources to accelerate address translation
- Employing Hybrid Address Mappings for Efficient Address Translation
- Development of an open-source simulation framework for memory management and virtual memory research, which includes a cross-layer interface to associate application metadata with memory regions
- Accelerating graph applications using SpMV and SpMSpV implementations in PIM
- Architectural support for fine-grained metadata (Address Scaling)
- A cross-layer solution to provide conflict-free accesses in modern SSDs
- Accelerating metagenomics via hardware/software codesign approach inside the storage system
- Experimental demonstration that COTS DRAM chips are capable of performing functionally-complete Boolean operations and many-input AND/OR operations
SoftHier: Exploring Explicitly Managed Memory Hierarchy Architectures for Scalable Acceleration [Torsten Hoefler/Luca Benini]

The objective of SoftHier is to drastically reduce hardware inefficiency by replac-ing caches with software-managed, efficient SPMs and DMAs, while maintaining program-ming effort low for the application developer. This goal will be achieved thanks to: (i) An increased level of automation in the programming tool-flow; (ii) The use of domain-specific languages and abstractions; (iii) Hardware enhancements in the memory hierarchy that facilitate automatic instantiation of accelerated data-motion primitives from abstract, streamlined software APIs.
Key achievements:
- Defined baseline template architecture for SoftHier, explored GEMM dataflow mapping, architecture design space, PPA
- Developed a High-Level Simulation Model with Open-source release
- PPA estimation for SoftHier Architecture
Research Grants
Internal research projects supported by an EFCL grant
Proving Properties to Improve HLS-Produced Circuits [Prof. L. Josipovic]

High-level synthesis (HLS) tools generate RTL designs from high-level programming languages like C/C++ and promise to liberate designers from low-level hardware details. HLSproduced dataflow circuits have performance merits over traditional circuits when accelerating workloads with unpredictable control flow or memory accesses. Yet, this gain is not for free: the bidirectional handshake communication signals and flexible dataflow mechanisms (e.g., memory interfaces) that enable dataflow circuits to excel also cause a significant and often unacceptable resource cost. Thus, there is a clear need to remove or simplify these constructs whenever their flexibility is not needed.
In this project, we are developing a formal verification framework targeting HLS-produced dataflow circuits. We aim to identify and formally prove that some circuit behaviors are impossible and safely remove the associated dataflow mechanisms, without compromising performance or functional correctness. In the past year, we tackled dataflow handshake signal verification and optimization: our circuits are significantly cheaper and equally performant asprior dataflow solutions. We will next tackle more complex dataflow mechanisms, such as memory interfaces and resource sharing logic; by formally proving the correctness of their simplifications, we will reduce the resource requirements of large dataflow designs and make this HLS paradigm more affordable and widely applicable.
Key achievements:
- Developed a strategy to systematically generate inductive invariants for fast and scalable verification of dataflow circuits generated from high-level programs.
- Developed a circuit optimization strategy that balances circuit latency and occupancy to suppress spurious dynamism and achieve area-efficient circuits.
Practical energy savings with rate adaptation in today’s computer networks [Prof. L. Vanbever]

We have known for years that modulating the available capacity of a computer network —number of Gbps transportable—can save a lot of energy… in theory. This project aims to materialize such savings by filling the gap between theory and practice. This requires adjusting for the limitations of today’s networking hardware and designing rate adaptation controllers integrating smoothly with existing networking protocols.
Key achievements:
- We developed and implemented a complete methodology for deriving precise power models for routers. We validated some of those models on production routers: we demonstrated for the first time that we can precisely predict the power impact of network management actions such as turning on/off ports or redirecting traffic.
- We proposed a practical protocol that turn underutilized network links. Applied on real-word data from two Internet Service Providers, we found that about one third or links can be turned off without creating congestion.
- By leveraging the power models mentioned above, we estimate that the energy savings would be negligible (<1%). This appears due, in part, to issues in router firmware: concretely, hardware components that are “turned off” are actually not “powered off.”
Contact
chevron_right Romain JacobRevolutionizing Assisted Living with Robotic Dogs: Embedding AI-Based On-Board Processing and Novel Sensors [Michele Magno]
Our project aims to revolutionize assisted living through the integration of advanced technology into robotic dog companions. By leveraging novel sensors and embedding AI-based algorithms directly onto the robotic dogs, we aim to enhance their autonomy and capabilities significantly. This innovation will enable the robotic dogs to provide safer guidance and navigation for humans in assisted living environments. Our primary focus lies in improving localization and perception capabilities, allowing the robotic dogs to navigate complex environments with ease while ensuring the safety of their human companions. Additionally, we will investigate compression and low-latency AI techniques (including LLM algorithms) tailored for embedded, low-power platforms, ensuring efficient operation of the onboard processing. This approach will not only improve the capabilities of the robotic dogs but also extend their operational lifespan. Through this project, we envision a future where robotic dogs play a pivotal role in providing assistance and companionship to those in need, ultimately improving their quality of life. To further advance this research, we will collaborate with bachelor students through thesis projects and master students, fostering academic involvement and innovation in this cutting-edge field.
Contact
chevron_right Michele MagnoFoundation Models for Biosignal Analysis [Luca Benini]
Biosignal analysis is a field of growing interest, also thanks to the increased availability of wearable devices for continuous data acquisition and monitoring. Among biosignals, electroencephalography (EEG) is of particular interest as it offers essential insights into the operation of the brain, aiding in the diagnosis and treatment of various diseases, and playing a key role in brain-machine interfaces. However, analyzing EEG signals presents considerable challenges due to their complexity and the need for precise differentiation between the background and the activities of the brain of interest. This project aims to advance the analysis of EEG signals employing foundation models and innovative deep learning techniques. Recent developments applied Transformer architectures to EEG data for superior representation learning. These approaches highlighted the potential of AI in improving EEG signal analysis, with promising applications in various classification tasks such as sleep stage and seizure classification, emotion recognition, and motor imagery classification. However, the exploration of foundation models for biosignal analysis is still at its infancy. In this project, our aim is to develop foundation models for EEG signal analysis by investigating data augmentation strategies, tokenization methods, self-supervised pre-training strategies, and model architecture design (Transformers or Mamba). Leveraging lessons learned from these developments on EEG signals, the project will also explore multimodality and foundation models for alternative physiological signals (such as PPG, EMG, or ultrasound).
Key achievements:
- Designed an EEG foundation model based on alternating attention along the channel and patch dimension.
- Developed the "TimeFM" repo for pretraining and finetuning foundation models for biosignals (EEG, EMG, ultrasound)
Contact
chevron_right Yawei LiFall Injury Classification [Torsten Hoefler]
Our objective is to decrease the rate of hospital admissions due to fall-related injuries while also minimizing the complications that arise when such injuries do occur.
The injury risk to the hip when falling can be assessed by using medical imaging techniques to build a three-dimensional model of a person’s hip, which is then used as input for finite element simulations of different fall scenarios. Such simulations are computationally expensive, and its inputs involve confidential patient data, and thus cannot be performed in the cloud.
The key to this project is to reduce the computation cost of fall-related hip injury prediction using AI methods. The most effective way to do this is not known, we will investigate multiple approaches, such as training a GNN on voxel data (which completely replaces the existing simulation pipeline) and using automatic differentiation to accelerate the existing simulation.
We will train our models using federated learning, enabling decentralized training while preserving data privacy. This approach allows collaboration without sharing sensitive data, improving model accuracy with a diverse dataset. We aim to train a large model on the server and then scale it down for use on client devices like those in hospitals, making it efficient for hardware with limited computational resources.
To summarize, our model will enable healthcare providers and individuals to take proactive steps to reduce fall-related injury risks, improving patient outcomes and lowering healthcare costs. With a privacy-focused, data-driven approach, we aim to advance fall prevention and intervention strategies.
Contact
chevron_right Timo SchneiderMetagenomic Analysis on Near-Data-Processing Platforms [Onur Mutlu]
Metagenomics applications monitor the diversity of organisms in different environments and are critical in clinical practice and public health. State-of-the-art analyses rely on high throughput sequencing for microorganism identification but suffer from a huge memory footprint, frequent data movement, and high dependency on internet access. This causes intense congestion in the network, high power consumption, and raises privacy concerns. To this end, it is paramount to perform high-speed genome detection locally for screening and diagnostic purposes. General genomic accelerator designs are suboptimal for metagenomics acceleration as they are not tailored to the specific pipeline steps, they neglect end-to-end acceleration and do not consider the significantly larger amounts of data. The goal of this project is to accelerate metagenomics for small-edge devices leveraging the near-data-processing (NDP) paradigm, as it can effectively address the huge data size, massive data movement, and parallel computation requirements of metagenomics. We aim to perform a holistic study of metagenomic algorithms and exploit processing-near-memory or in-storage subsystem (PNM) and processing-using-memory (PUM) to accelerate metagenomics steps. The result will be an end-to-end NDP system that combines both PNM and PUM proposed accelerators in a co-designed approach that seamlessly integrates within a metagenomics pipeline.
Student projects
Smaller projects based on pre-PhD research supported by an EFCL grant
Integrating a PULP SoC as a CubeSat Test Article [Luca Benini]
CubeSats, tiny, cost-effective satellites, have opened up new possibilities for exploring space, making it easier for researchers and educators to launch their projects beyond Earth. However, these miniature satellites face tough challenges in space, such as harmful radiation and extreme temperatures, which can damage their electronic parts and affect their performance.
The Trikarenos System on Chip (SoC) is a research prototype system designed with standard technologies to operate in environments affected by radiation. It includes smart features that help it correct errors caused by radiation, ensuring it remains reliable and keeps functioning correctly.
The project's main goal is to test Trikeranos by integrating it into a CubeSat mission called ARIS SAGE. This test will show how well the Trikarenos SoC works when affected by radiations and will help improve how CubeSats operate in future missions.
This effort aligns with the interests of the ETH Future Computing Laboratory in making computing devices more reliable. By successfully completing this project, we hope to gain valuable insights that could help in designing future technology that can perform well in extreme conditions, both in space and on Earth.
Contact
chevron_right Michael RogenmoserRevisiting Memory Performance Attacks in the Era of RowHammer Defenses [Onur Mutlu]
RowHammer is a major DRAM read disturbance mechanism, where repeatedly accessing (hammering) a row of DRAM cells (DRAM row) induces bitflips in other physically nearby DRAM rows. RowHammer solutions perform preventive actions (e.g., refresh neighbor rows of the hammered row) that mitigate such bitflips to preserve memory isolation, a fundamental building block of security and privacy in modern computing systems. However, preventive actions induce non-negligible memory request latency and system performance overheads as they interfere with memory requests. As shrinking technology node size over DRAM chip generations exacerbates RowHammer, the overheads of RowHammer solutions become prohibitively expensive. As a result, a malicious program can effectively hog the memory system and deny service to benign applications by causing many RowHammer-preventive actions.
We plan to tackle the performance overheads of RowHammer solutions by tracking and throttling the generators of memory accesses that trigger RowHammer solutions. To this end, we propose a research plan to investigate novel methods of identifying threads that reduce memory throughput, and new mechanisms to reduce the negative effects of such threads on system performance and data availability. We hope and expect that our novel techniques and new mechanisms will significantly reduce the number of RowHammer-preventive actions performed, thereby improving 1) system performance and DRAM energy efficiency, and 2) reducing the maximum slowdown induced on a benign application.