Msc thesis 25/26

Table of Contents

CXL-Aware Memory Allocation

Compute Express Link (CXL) is a novel memory interconnect technology that allows multiple hosts to access each other’s memory in a direct and transparent manner. CXL enables novel designs which recent applications on LLM inference, databases and HPC have started exploiting. CXL promises the best of both worlds: applications retain a single-machine programming model (as in vertical scaling) but can now scale memory across hosts (as in horizontal scaling).

However, as is common with novel technologies, fully exploiting its potential requires full application redesigns and expert programming knowledge. CXL introduces differentiated memory accesses, both in terms of performance and correctness semantics, as well as partial failures. In particular, existing memory allocators are unaware of the CXL topology and make poor placement decisions leading to performance degradation and correctness issues.

Goals

This thesis offers the opportunity to directly work with the cutting edge hardware available at our lab. The goal is to design, implement, and evaluate a CXL-aware memory allocator, that presents to applications a general, performant and correct vision of the underlying memory.

Bibliography (starting point)

Requirements

CXL-Aware Process Life Cycle Management

Compute Express Link (CXL) is a novel memory interconnect technology that allows multiple hosts to access each other’s memory in a direct and transparent manner. CXL enables novel designs which recent applications on LLM inference, databases and HPC have started exploiting. CXL promises the best of both worlds: applications retain a single-machine programming model (as in vertical scaling) but can now scale memory across hosts (as in horizontal scaling).

However, as is common with novel technologies, fully exploiting its potential requires full application redesigns and expert programming knowledge. CXL introduces differentiated memory accesses, both in terms of performance and correctness semantics, as well as partial failures. In particular, multi-threaded and multi-process applications cannot exploit CXL distributed nature since there are no built-in mechanisms to manage computation (for instance creating a thread on a remote host).

Goals

This thesis offers the opportunity to directly work with the cutting edge hardware available at our lab. The goal is to design, implement, and evaluate a *CXL-aware process life cycle manager *, that allows applications to scale out computation in a transparent manner.

Bibliography (starting point)

Requirements

CXL-Aware Fault Injection

Compute Express Link (CXL) is a novel memory interconnect technology that allows multiple hosts to access each other’s memory in a direct and transparent manner. CXL enables novel application designs with recent examples on LLM inference, databases and HPC. CXL promises the best of both worlds: applications retain a single-machine programming model (as in vertical scaling) but can now scale memory across hosts (as in horizontal scaling).

However, as is common with novel technologies, fully exploiting its potential requires full application redesigns and expert programming knowledge. CXL introduces differentiated memory accesses, both in terms of performance and correctness semantics, as well as partial failures. In particular, memory accesses that were previously infallible can now fail if the corresponding memory resides in a remote machine.

Goals

This thesis offers the opportunity to directly work with the cutting edge hardware available at our lab. The goal is to design, implement, and evaluate a CXL-aware fault injector, that allows to assess the correctness of CXL applications under faults.

Bibliography (starting point)

Requirements

CXL-Aware Programming Abstractions

Compute Express Link (CXL) is a novel memory interconnect technology that allows multiple hosts to access each other’s memory in a direct and transparent manner. CXL enables novel application designs with recent examples on LLM inference, databases and HPC. CXL promises the best of both worlds: applications retain a single-machine programming model (as in vertical scaling) but can now scale memory across hosts (as in horizontal scaling).

However, as is common with novel technologies, fully exploiting its potential requires full application redesigns and expert programming knowledge. CXL introduces differentiated memory accesses, both in terms of performance and correctness semantics, as well as partial failures. In particular, there is currently no programming language support to expose these CXL characteristics to the programmer.

Goals

This thesis offers the opportunity to directly work with the cutting edge hardware available at our lab. The goal is to design, implement, and evaluate a CXL-aware type system, in Rust that allows programmers to explicitly reason about access locality and access failures.

Bibliography (starting point)

Requirements