Current projects

Research Projects

Projects supported by industry partners


Semantics and Implementation of ACID properties in modern hardware [Gustavo Alonso]




LLM Agent Firewall [Srdjan Capkun]

Enlarged view: ifsc projects page figure

Large Language Model (LLM) agents offer increasingly rich functionalities and capabilities to act on users' inputs and data. However, it remains unclear how to safely restrict LLM agents to the desired data and functionalities, and what are their implications on the privacy and integrity of the systems that they operate in.

In this project, we investigate how LLM agents need to be restricted in terms of the access to system data and resources, actions that they are allowed to take, and which access control models and mechanisms should be deployed to achieve access control and isolation properties. It is not yet clear whether existing solutions can be leveraged in an agentic setting and what are the tradeoffs of different solutions. For example, in a corporate setting where different roles have access to different projects, a single assistant trained on all the company's data provides efficient training but insufficient access control due to threats such as memorization and membership inference attacks. Instead, training one model for each permission level can quickly lead to training an exponential number of models.

We aim to analyze the combinations of novel LLM environments that will restrict the agents, novel system access control policies and LLM training, all of which combined will limit LLM agents. This is a challenging problem, as solutions must (i) guarantee information isolation between different groups while (ii) allowing computationally reasonable training, finetuning, and updating, and (iii) not harming functionality and providing high-quality results.  


Graph Computations and LLMs: A Synergy [Torsten Hoefler]

In this project, we will explore the synergy between graph computing and LLMs in order to improve the efficiency and effectiveness of both classes of workloads.On one hand, we will work on enhancing the LLM ecosystem by exploring – for example – how to harness the graph abstraction for more powerful LLM inference and agent architectures, how to enhance the design of Retrieval Augmented Generation (RAG), or how to extend the fine tuning pipeline of LLMs with graph tasks for more powerful LLM reasoning. Here, we will build upon - among others - our recent LLM outcomes such as Graph of Thoughts or Topologies of Reasoning. One example approach is the integration of Retrieval-Augmented Generation (RAG), which allows models to offload part of the memory burden by retrieving relevant external information dynamically. However, RAG alone may not address the full scope of long-term memory needs, especially when managing long-term dependencies or multitasking over extended contexts.

Similarly, we will explore the potential of harnessing graphs for more effective Transformers. Transformers have demonstrated remarkable success in various tasks, but one area that remains under-explored is the integration of graph structures to enhance their capacity and efficiency. A promising avenue is reorganizing Mixture of Experts (MOEs) using a graph-based structure. Instead of treating experts as independent units that are selectively activated, a graph topology can be employed, where each expert becomes a node, and the edges represent pathways of knowledge transfer or communication between experts. Finally, fine-tuning LLMs on graph-based tasks presents a unique opportunity to enhance the model's performance on graph tasks such as node classification or link prediction. If time allows, we will also explore this direction.


Unifying High-Performance Automated Differentiation for Machine Learning and Scientific Computing [Torsten Hoefler]

Enlarged view: Unifying High-Performance Automated Differentiation for Machine Learning and Scientific Computing

In this project, we aim to develop a high-performance automatic differentiation framework capable of supporting both machine learning models (from PyTorch/ONNX) and complex scientific computing programs (Python, Fortran, etc.). This will bridge the gap between the two fields and enable the exploration of hybrid learning techniques that outperform classical scientific approaches. To enable the differentiation of larger, memory-intensive programs, we will explore novel techniques for automatic checkpointing. We leverage the power of the stateful data-flow graph (SDFG) representation, on which we apply AD, statically analyze the graph to determine which activations to recompute or store, and utilize data-centric optimizations to enhance the gradient calculation performance.

Key Achievements:

  • State-of-the-art performance for gradient calculation on NPBench kernels outperforming JAX
  • ILP-Based solution for the store-recompute problem in automatic differentiation

Processing-in-Memory Architectures for Data-Intensive Applications [Onur Mutlu]

Data movement between the memory units and the compute units of current computing systems is a major performance and energy bottleneck. From large-scale servers to mobile devices, data movement costs dominate computation costs in terms of both performance and energy consumption. For example, data movement between the main memory and the processing cores accounts for 62% of the total system energy in popular consumer applications run on mobile systems, including web browsing, video processing, and machine learning (ML) inference, as we analyzed and demonstrated in our ASPLOS 2018 paper. Our more recent study in PACT 2021 shows that more than 90% of the total system energy is spent on memory in large edge ML models. As a result, the data movement bottleneck is a huge burden that greatly limits the efficiency and performance of modern computing systems.

Many modern and important workloads such as ML models (including large language models (LLMs)), graph processing, rendering, databases, video analytics, real-time data analytics, and computational biology suffer greatly from the data movement bottleneck. These workloads are exemplified by irregular memory accesses, relatively low data reuse, low cache utilization, low arithmetic intensity (i.e., ratio of operations per accessed byte), and large datasets that greatly exceed the main memory size. The amounts of computation and locality present in these workloads cannot usually amortize the data movement costs. In order to alleviate this data movement bottleneck, we need a paradigm shift from the traditional processor-centric design, where all computation takes place in the compute units in the processor, to a more data-centric design, where processing elements are placed closer to where the data resides. This paradigm of computing is known as Near-Data Processing (NDP) or Processing-in-Memory (PIM). PIM architectures can be classified into two categories: 1) Processing-near-Memory (PnM), where computation takes place in dedicated processing elements (e.g., accelerators, processing cores, reconfigurable logic) placed near the memory array, and 2) Processing-using-Memory (PuM), where computation takes place inside the memory array by exploiting intrinsic analog operational properties of the memory device.

In this project, we aim to fundamentally alleviate data movement bottleneck and, thus, significantly accelerate modern data-intensive applications, such as large-scale ML models, LLMs, graph processing, rendering, databases, video analytics, real-time data analytics, and computational biology using PIM architectures

Research Grants

Internal research projects supported by an EFCL grant






Student projects

Smaller projects based on pre-PhD research supported by an EFCL grant


JavaScript has been disabled in your browser