Operating Systems

📄 paper

Guidelines for Building Indexes on Partially Cache-Coherent CXL Shared Memory

The \emph{Partial Cache-Coherence (PCC)} model maintains hardware cache coherence only within subsets of cores, enabling large-scale memory sharing with emerging memory interconnect technologies like Compute Express Link (CXL). However, PCC's relaxation of global cache coherence compromises the correctness of existing single-machine software. This paper focuses on building consistent and efficient indexes on PCC platforms. We present that existing indexes designed for cache-coherent platforms can be made consistent on PCC platforms following SP guidelines, i.e., we identify \emph{sync-data} and \emph{protected-data} according to the index's concurrency control mechanisms, and synchronize them accordingly. However, conversion with SP guidelines introduces performance overhead. To mitigate the overhead, we identify several unique performance bottlenecks on PCC platforms, and propose P$^3$ guidelines (i.e., using Out-of-\underline{P}lace update, Re\underline{P}licated shared variable, S\underline{P}eculative Reading) to improve the efficiency of converted indexes on PCC platforms. With SP and P$^3$ guidelines, we convert and optimize two indexes (CLevelHash and BwTree) for PCC platforms. Evaluation shows that converted indexes' throughput improves up to 16$\times$ following P$^3$ guidelines, and the optimized indexes outperform their message-passing-based and disaggregated-memory-based counterparts by up to 16$\times$ and 19$\times$.

Operating_Systems advanced

By: Fangnuo Wu, Mingkai Dong, Wenjun Cai +2 more

Source: arXiv Nov 9, 2025

0.0

5 min read

0

Quality

📄 paper

Preemption-Enhanced Benchmark Suite for FPGAs

Field-Programmable Gate Arrays (FPGAs) have become essential in cloud computing due to their reconfigurability, energy efficiency, and ability to accelerate domain-specific workloads. As FPGA adoption grows, research into task scheduling and preemption techniques has intensified. However, the field lacks a standardized benchmarking framework for consistent and reproducible evaluation. Many existing studies propose innovative scheduling or preemption mechanisms but often rely on proprietary or synthetic benchmarks, limiting generalizability and making comparison difficult. This methodical fragmentation hinders effective evaluation of scheduling strategies and preemption in multi-tenant FPGA environments. This paper presents the first open-source preemption-enabled benchmark suite for evaluating FPGA preemption strategies and testing new scheduling algorithms, without requiring users to create preemption workloads from scratch. The suite includes 27 diverse applications spanning cryptography, AI/ML, computation-intensive workloads, communication systems, and multimedia processing. Each benchmark integrates comprehensive context-saving and restoration mechanisms, facilitating reproducible research and consistent comparisons. Our suite not only simplifies testing FPGA scheduling policies but also benefits OS research by enabling the evaluation of scheduling fairness, resource allocation efficiency, and context-switching performance in multi-tenant FPGA systems, ultimately supporting the development of better operating systems and scheduling policies for FPGA-based environments. We also provide guidelines for adding new benchmarks, enabling future research to expand and refine FPGA preemption and scheduling evaluation.

Operating_Systems advanced

By: Arsalan Ali Malik, John Buchanan, Aydin Aysu

Source: arXiv Nov 9, 2025

0.0

10 min read

0

Quality

📄 paper

Work-in-Progress: Function-as-Subtask API Replacing Publish/Subscribe for OS-Native DAG Scheduling

The Directed Acyclic Graph (DAG) task model for real-time scheduling finds its primary practical target in Robot Operating System 2 (ROS 2). However, ROS 2's publish/subscribe API leaves DAG precedence constraints unenforced: a callback may publish mid-execution, and multi-input callbacks let developers choose topic-matching policies. Thus preserving DAG semantics relies on conventions; once violated, the model collapses. We propose the Function-as-Subtask (FasS) API, which expresses each subtask as a function whose arguments/return values are the subtask's incoming/outgoing edges. By minimizing description freedom, DAG semantics is guaranteed at the API rather than by programmer discipline. We implement a DAG-native scheduler using FasS on a Rust-based experimental kernel and evaluate its semantic fidelity, and we outline design guidelines for applying FasS to Linux Linux sched_ext.

Operating_Systems advanced Robotics

By: Takahiro Ishikawa-Aso, Atsushi Yano, Yutaro Kobayashi +3 more

Source: arXiv Nov 11, 2025

0.0

5 min read

0

Quality

📄 paper

Taiji: A DPU Memory Elasticity Solution for In-production Cloud Environments

The growth of cloud computing drives data centers toward higher density and efficiency. Data processing units (DPUs) enhance server network and storage performance but face challenges such as long hardware upgrade cycles and limited resources. To address these, we propose Taiji, a resource-elasticity architecture for DPUs. Combining hybrid virtualization with parallel memory swapping, Taiji switches the DPU's operating system (OS) into a guest OS and inserts a lightweight virtualization layer, making nearly all DPU memory swappable. It achieves memory overcommitment for the switched guest OS via high-performance memory elasticity, fully transparent to upper-layer applications, and supports hot-switch and hot-upgrade to meet in-production cloud requirements. Experiments show that Taiji expands DPU memory resources by over 50%, maintains virtualization overhead around 5%, and ensures 90% of swap-ins complete within 10 microseconds. Taiji delivers an efficient, reliable, low-overhead elasticity solution for DPUs and is deployed in large-scale production systems across more than 30,000 servers.

Operating_Systems advanced

By: Hao Zheng, Longxiang Wang, Yun Xu +22 more

Source: arXiv Nov 12, 2025

0.0

5 min read

0

Quality

📄 paper

Optimizing CPU Cache Utilization in Cloud VMs with Accurate Cache Abstraction

This paper shows that cache-based optimizations are often ineffective in cloud virtual machines (VMs) due to limited visibility into and control over provisioned caches. In public clouds, CPU caches can be partitioned or shared among VMs, but a VM is unaware of cache provisioning details. Moreover, a VM cannot influence cache usage via page placement policies, as memory-to-cache mappings are hidden. The paper proposes a novel solution, CacheX, which probes accurate and fine-grained cache abstraction within VMs using eviction sets without requiring hardware or hypervisor support, and showcases the utility of the probed information with two new techniques: LLC contention-aware task scheduling and virtual color-aware page cache management. Our evaluation of CacheX's implementation in x86 Linux kernel demonstrates that it can effectively improve cache utilization for various workloads in public cloud VMs.

Operating_Systems advanced

By: Mani Tofigh, Edward Guo, Weiwei Jia +2 more

Source: arXiv Nov 12, 2025

0.0

5 min read

0

Quality

📄 paper

Vmem: A Lightweight Hot-Upgradable Memory Management for In-production Cloud Environment

Traditional memory management suffers from metadata overhead, architectural complexity, and stability degradation, problems intensified in cloud environments. Existing software/hardware optimizations are insufficient for cloud computing's dual demands of flexibility and low overhead. This paper presents Vmem, a memory management architecture for in-production cloud environments that enables flexible, efficient cloud server memory utilization through lightweight reserved memory management. Vmem is the first such architecture to support online upgrades, meeting cloud requirements for high stability and rapid iterative evolution. Experiments show Vmem increases sellable memory rate by about 2%, delivers extreme elasticity and performance, achieves over 3x faster boot time for VFIO-based virtual machines (VMs), and improves network performance by about 10% for DPU-accelerated VMs. Vmem has been deployed at large scale for seven years, demonstrating efficiency and stability on over 300,000 cloud servers supporting hundreds of millions of VMs.

Operating_Systems advanced

By: Hao Zheng, Qiang Wang, Longxiang Wang +16 more

Source: arXiv Nov 12, 2025

0.0

5 min read

0

Quality

📝 blog

Haiku gets new guarded heap for the kernel

Another month, another Haiku activity report, and this time we’ve got a major change under the hood: a brand new guarded heap. The old guarded heap was suboptimal and had started to lag behind, so the new one attempts to rectify some of these shortcomings. So, to rectify these limitations, I rewrote the kernel guarded heap more or less from scratch, taking the old code into account where it made sense but otherwise creating entirely new bookkeeping structures, interacting directly with ...

Operating_Systems intermediate Operating SystemsKernel Development

By: Thom Holwerda

Source: OSnews Nov 14, 2025

0.0

1 min read

0

Quality

Sat, Nov 15

Guidelines for Building Indexes on Partially Cache-Coherent CXL Shared Memory

Preemption-Enhanced Benchmark Suite for FPGAs

Work-in-Progress: Function-as-Subtask API Replacing Publish/Subscribe for OS-Native DAG Scheduling

Taiji: A DPU Memory Elasticity Solution for In-production Cloud Environments

Optimizing CPU Cache Utilization in Cloud VMs with Accurate Cache Abstraction

Vmem: A Lightweight Hot-Upgradable Memory Management for In-production Cloud Environment

Haiku gets new guarded heap for the kernel