Past projects
Research Projects
Projects supported by industry partners
A RISC-V Vector-Processor for High-throughput Multidimensional Sensor Data Processing [Prof. Luca Benini]

The project gives life to the new scalable Ara vector processor, compliant with the latest RISC-V vector specifications and tuned to process parallel workloads with a particular focus on high energy efficiency and performance.
Key achievements:
- Expanded benchmark pool with heterogeneous applications for high-dimensional workloads
- Updated Ara from RISC-V V v0.5 to the most recent RISC-V Vector ISA (v0.9)
- uArchitecture improvements for IPC, Performance, Energy Efficiency
- Addressed coherence problem between Ara and Ariane
- HW/SW interface mixing intrinsic, hand-optimized macros, vector compilation
MmS: Mempool meets Systolic [Prof. Luca Benini]

MemPool meets Systolic brings flexible and efficient systolic computation to MemPool, a large shared-memory manycore system, through lightweight hardware extensions enabling fast inter-core communication.
Key achievements:
- Implemented a Hybrid systolic shared-memory system
- Explored hybrid systolic topologies
- Implemented dedicated ISA extensions to support systolic configurations (23X speedup)
- Outperform pure shared-memory architecture implementations (17% improvement)
Integrity and Access Control in Distributed Memory Systems [Prof. S. Capkun, Prof. S. Shinde]
Highly distributed memory systems have become one of the main enablers of modern computing platforms. In this project, we investigate how to design and build appropriate access control and integrity to such systems.
Key achievements:
- Designed a data-center scale confidential computing architecture
- Added TEE support to two accelerators (AI and storage)
- Marginal overhead (0.42-8%)
Contact
chevron_right Supraja SridharaDaFlEx: Performance portability through dataflow extraction [Prof. Torsten Hoefler, Prof. Luca Benini]
The goal of this project is to extract dataflow information from programs written in imperative programming languages and create efficient versions of these for multiple hardware platforms. We leverage and extend the powerful DaCe framework to expose parallelism and improve application performance.
Key achievements:
- Allow dataflow extraction from MLIR programs Bridging control-centric and data-centric optimization
- Extraction of dataflow representations of C programs Lifting C Semantics for Dataflow Optimization
- Extraction of dataflow representations of Fortran programs with a focus on climate and weather models
- Improvements to the translation of pointer iteration patterns in C towards data-centric representations
Contact
chevron_right Alexandru CalotoiuA New Methodology and Open-Source Benchmark Suite for Evaluating Data Movement Bottlenecks: A Processing-in-Memory Case Study [Prof. Onur Mutlu]

Our methodology to characterize data movement bottlenecks can enable the adoption of processing-in-memory in real-world computing systems.
Key achievements:
- Programming a Real-World Processing-in-Memory Architecture
- System Support for Processing-using-Memory Architectures
- Accelerating Data-Intensive Stencil Applications with Processing-near-Memory Architectures
- Designing an End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Processing
- Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System
- Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems
Contact
chevron_right Juan Gomez LunaMachine-Learning-Assisted Intelligent Micro-architectures to Reduce Memory Access Latency [Prof. Onur Mutlu]

Machine-learning-based control and speculation policies help us create intelligent microarchitectures for next-generation processors.
Key achievements:
- Data-driven and HW/SW-co designed techniques for prefetching (presented in PACT SRC’23)
- Data-driven approaches for managing hybrid memory/store systems (presented in ISCA’23)
Contact
chevron_right Rahul BeraSensor Fusion [Prof. Luc Van Gool]

In the project Sensor Fusion running at Computer Vision Lab, sensor fusion network architectures are developed for semantic understanding of driving scenes under varying and adverse visual conditions. In particular, complementary sensor information is fused adaptively to recognize the content of each scene, depending on the visual conditions at hand and the robustness of each sensor to them.
Key achievements:
- Introduce a transformer-based sensor fusion architecture for dense 2D semantic perception which effectively fuses large numbers of input modalities with minimal computational overhead compared to unimodal counterpart
- Design weakly supervised domain adaptation methods for semantic segmentation based on cross-domain image-level correspondences by adaptively refining pseudolabels and contrastively aligning features
- Construct the first large-scale multimodal driving dataset for dense semantic perception under diverse visual conditions, including different combinations of time of day, visibility, and type of precipitation, and featuring a frame camera, an event camera, a lidar, a radar, and an IMU/GNSS sensor
See also: https://muses.vision.ee.ethz.ch/
Contact
chevron_right Christos SakaridisApplication-Specific Architectures [Prof. Torsten Hoefler]
Spatial (or dataflow) devices are a viable and interesting option to classical computer organizations. They offer massive parallelism, having hundreds (or even thousands) of processing elements, that can communicate through a fast network on chip. Many spatial architectures are offered today as ML accelerators, but how to map and schedule applications on these devices is still an open challenge. With this project, we investigate methodologies and tools for the rapid development and evaluation of Domain-Specific Architectures (DSAs) or Domain-Specific Systems on Chip (DSSoCs), to be able to adapt to the rapid evolution of algorithms in a cost-effective way.
Key achievements:
- proposed a computational model to reason about application scheduling for spatial accelerators;
- proposed scheduling solutions that leverage the unique characteristics of these devices.
- proposed a proof-of-concept framework for Application-Specific Architecture design. The framework takes in input a user-provided application and performs a Design Space Exploration phase. The goal of this exploration is to return a (or a set of) macro-level architecture descriptions of a System on Chip (SoC) able to execute the application, resulting in good performance/power/area trade-offs. The framework uses as frontend DaCe
Contact
chevron_right Tiziano De MatteisDesign of ExG-glasses for Brain-Computer-Interfaces [Prof. Luca Benini]
The project targets the development of inconspicuous smart glasses for recording of EOG (ocular) and EEG (brain) signals with a fully-dry setup, coupled with onboard ultra low-power processing capabilities.
Key achievements:
- first prototype designed and presented at Tokyo Wearable Expo
- demo of EOG-based speller
Contact
chevron_right Andrea CossettiniResearch Grants
Internal research projects supported by an EFCL grant
Quadrupedal Robot for Visually Impaired People Assistance [Dr. M. Magno]
The project will be based on the Unitree A1 quadrupedal robot, involving both the aspect of autonomous robot navigation and Human Robot Interaction with the visually impaired person. The goal is to reach autonomous navigation in a dynamic indoor environment, with the possibility of extending the operating range to outdoor environments.
Key achievements:
- created a new Bachelor Course: 227-0085-58L Projekte & Seminare: Autonomous Cars and Robots
- Various interviews, including international exposure (Reuters, ETH news, RSI https://www.youtube.com/watch?v=oyYWoCH7ij0)
- Participation at Scientifica 2023
- First complete autonomous navigation assistance demo
Unified Management of Address Spaces and Files and Its Implications on Security and Performance [Prof. K. Razavi]
This project looks at improving the security of the memory subsystem through better isolation of data on persistent memory without sacrificing performance. It further investigates the implications of certain request ordering optimizations on DRAM security when considering recent attacks. The first direction led to the design of a persistent memory file system with improve security and the second direction concluded with new attacks that compromise browsers through the exploitation of certain optimizations in the DRAM subsystem.
Key achievements:
- A new file system that shows maintaining data isolation does not require sacrificing the high performance that persistent memory file systems provide
- Showing that Rowhammer attacks can be made versatile in browser settings due to certain optimization features in the standard
Contact
chevron_right Prof. Kaveh RazaviScalabel: Distributed Human Machine Collaboration System for Visual Data Annotation [Prof. F. Yu]
Since 2023, we have witnessed significant breakthroughs in Artificial Intelligence, notably GPT-4 from OpenAI in the realm of NLP and SAM from Meta AI in the vision domain. These advancements brought the concept of foundational models to the forefront, leading to a growing demand in both industry and academia for solutions to utilize and/or build foundational models, which involves a wide range of capabilities, such as constructing large datasets from human-machine interactions and training models with large-scale GPU clusters. Facing these new challenges, the Scalabel project embraced the ambition to provide a comprehensive solution. We built an efficient system from scratch including user-friendly web interface, efficient backend, and flexible model server to enable everyone to create their own system for processing the large-scale data from those generative and discriminative models.
Key achievements:
- re-designed the whole system framework and user interfaces based on the latest technologies. New interface elements such as brush and label cards are added to improve the user interface flexibility and efficiency.
- On the backend, the system can support using the latest segmentation and tracking models to accelerate the visual data labeling process. The user of the system can also plug in their models.
- released a working system with a user-friendly interface and integration with various deep learning models such as SAM at https://github.com/SysCV/nutsh.
- A well-maintained documentation is released at https://nutsh.ai/docs.
Contact
chevron_right Prof. Fisher YuBlended Projects
Internal research projects with multiple PIs supported by an EFCL grant
PIM Acceleration of Nanopore Raw-signal-based Genome Analysis [Prof. O. Mutlu & Prof. T. Jang]

Nanopore sequencing is a widely utilized, high-throughput, and low-cost genome sequencing technology, which is capable of sequencing long genome fragments into raw electrical signals. Our goal for this project is to enable real-time analysis for the sequencing of multiple nanopores by using energy-efficient and highly-parallel processing-in-memory techniques.
Key achievement:
- We proposed GenSig, the first in-memory processing system for the raw-signal genome analysis. In our evaluations, we observe that the seeding steps in the state-of-the-art software solu-tions are either not PIM-friendly or imprecise.
- We proposed a new PIM-friendly algorithm, HashVote, that integrates voting into hash-based seeding to filter out impossible mapping positions and then reduces the workload for the subsequent chaining step.
- We designed a heterogeneous PIM, PIMVote, that leverages the advantages of (1) the flexibil-ity of PNM and (2) the high performance and low energy consumption of PUM.
- We implemented a software tool for our HashVote, which will be open-sourced later. We also evaluate GenSig and demonstrate that GenSig provides significant performance and energy benefits over the state-of-the-art methods.
Contact
chevron_right Haiyu MaoUltrasound Image Data Recycler [Prof. L. Benini & Prof. L. Van Gool]

Medical ultrasound (US) imaging is a vital diagnostic tool and has many areas of application. The raw data from US imaging, known as radio-frequency (RF) data, contains more information than US images and has valuable use cases. Although large datasets of processed US images are widely available, raw RF data remains scarce. The project aims at developing a physically informed neural network architec-ture that is able to convert ultrasound images back into raw frequency data. In order to provide doctors with a human-readable image for their diagnoses, this step is usually carried out in reverse order, whereby the more comprehensive raw data is lost. However, novel wearable ultrasound devices need to be trained on raw data sets so that the networks can operate at the extreme edge. Therefore, the project aims to develop a model that can perform robust back conversion regardless of vendor type, anatomical structure, or subjects.
Key achievements:
- Developed and validated a method to create virtual image phantoms that, when fed into a numerical simulator, provide RF and image results comparable to real world ultrasound data. Specifically, the project analyzed the parameter mapping of 4 data maps, namely density, speed of sound, attenuation and scattering for the use in an ultrasound simulator (k-wave).
- Optimized the computational effort for numerical data generation and developed two machine learning models (Unet, TransUnet) to convert ultrasound images into the 4 input maps for the numerical simulator
Student projects
Smaller projects based on pre-PhD research supported by an EFCL grant
Efficient Smart Edge Computing for Controlling Unmanned Aerial Vehicles using a Brain–Machine Interface [Prof. L. Benini]
In this project, we develop miniaturized, comfortable, and non-stigmatizing BMIs based on dry EEG electrodes to decode users’ intention in real-time. We use the BMI paradigm of motor movement and/or imagery, i.e., the subject moves or imagines the movement of a body part while the BMI device collects EEG data and decodes the subject’s intention. Moreover, the BMI device features on-board processing capabilities using an open-source Parallel Ultra-Low-Power platform based on RISC-V instruction set architecture. The goal is to acquire EEG data, process it locally at the edge in real time, and finally send out the command to control a flying drone.
Key achievements:
- Proposed a comfortable and non-stigmatizing EEG headband featuring eight soft elastic elastomer-based dry active electrodes and BioWolf, a miniaturized acquisition device capable of onboard processing
- Collected several sessions of EEG data and demonstrated that the proposed Transfer-Learning techniques can improve the inter-session accuracy by up to 30% compared to the baseline.
- Profiled MI-BMInet on the parallel ultra-low power (PULP) RISC-V microprocessor (Mr. Wolf, within BioWolf), showing an execution time of approx 6 ms and consuming up to 30 uJ to execute one inference. This system can operate for more than 30 h, with an inference time of 100 ms
Contact
chevron_right Xiaying WangImproving multi-row activation in off-the-shelf DRAM chips for in-memory operations [Prof. O. Mutlu]

We observe that off-the-shelf DRAM chips are capable of simultaneously activating up to 32 rows, which enables us to achieve high reliability and performance in Processing-using-DRAM operations.
Key achievement:
- Demonstrate, through an extensive experimental characterization of 120 modern DRAM chips from two major manufacturers, that modern DRAM chips can simultane-ously activate up to 32 DRAM rows.
- Demonstrate a proof-of-concept that off-the-shelf DRAM chips are capable of 1) in-creasing success rate of MAJ by replicating the input of MAJ operations and 2) executing MAJ5, MAJ7, MAJ9, and Multi-RowInit (i.e., copying one row’s content to multiple DRAM rows) operations.
- Showed the effect of DRAM operating parameters (i.e., timing delays between DRAM commands, data pattern, temperature, and voltage) on simultaneous many-row activa-tion, MAJ, and Multi-RowInit operations.
Contact
chevron_right Haiyu MaoVisitor projects
Female-Oriented Computing: Differential Harms, Standardization Gaps and Solutions [Maryam Mehrnezhad]
In this project, we aim to critically and rigorously study the existing technological standards concerning the security and privacy of fertility data. Our study covers a wide range of global policies, guidelines, standards, and regulations, including but not limited to the GDPR, the guidelines produced by the European Medicines Agency (EMA), and various US privacy standards and policies. Via our study we hope to identify the (lack of) mandated security and privacy protections surrounding fertility data. In combination with previous work by Dr Mehrnezhad in the fields of cyber security and privacy practices of online and modern technologies across different demographics and marginalised user groups, as well as Dr van der Merwe’s expertise in cyber security and privacy standardisation, we will produce the first SoK paper in this area. This will be of benefit to various communities: academia, industry, policy makers, human right activists, and end users.
Processing in DRAM and Rowhammer [Oguz Ergin]
The work focused on how to reduce the performance impact of rowhammer mitigation techniques. For this purpose, we investigated two ideas: 1. Characterization of the DRAM cells and doing partial charge restoration 2. Implementing a throttling mechanism.
We did a thorough characterization to see the relation between refresh, data retention time and rowhammer and proposed a scheme that tunes refresh latency dynamically. Our second idea (that we call “BreakHammer”) limits the number of on-the-fly requests a thread can inject into the memory system based on the thread’s rowhammer likelihood.
These two studies significantly reduced the performance degradation associated with rowhammer mitigation techniques. They are part of an early wave of research that, distinctively, does not aim to introduce new rowhammer mitigation methods. Instead, these works offer innovative approaches to lessen the performance overhead introduced by existing and forthcoming rowhammer mitigation strategies.
Design exploration on multisensory resource con-strained platforms for computationally intensive tasks [Simone Benatti]
This project focused on the exploration of energy efficient algorithms for Human-Machine Interfaces based on biosignal processing.
First, the work focused on requirements analysis of complex biopotential processing algorithms, with special focus on Blind Source Separation and on the sensor fusion between EMG/EEG and Ultrasound. Susequently, the work focused on developing more sophisticated algorithms that improve the accuracy and efficiency of biopotential signal processing. One of the leitmotivs of our research was to solve the problems that allow to bring advanced algorithm processing “out-of-the-lab”, coping with the non-idealities of the user conditions in daily life applications.
Regarding BSS, we demonstrate that it is possible to adapt the ICA, dynamically, to cope with the length variability of the muscles during a contraction. Regarding EEG processing, we demonstrate effective and accurate seizure detection performance on heavily imbalanced datasets, while being suited for implementation on energy-constrained plat-forms.