Releases

Hardware-accelerated LDPC in OCUDU: ACC100 offload now available

Hardware-accelerated LDPC in OCUDU: ACC100 offload now available

A complete DPDK BBDEV backend for the Intel ACC100 lands in OCUDU, cutting LDPC decode latency, raising PDSCH throughput, and freeing approximately one upper-PHY core under sustained traffic.

By OCUDU India | Wednesday, April 22, 2026

Modern RAN stacks are expected to offload their heaviest DSP workloads LDPC channel coding chief among them to dedicated accelerators. Look-aside FEC cards, inline SoCs, and SmartNICs have been standard integration points in commercial vRAN for several years, and the open-source ecosystem has largely followed suit.

Until this release, OCUDU did not. The hardware-abstraction layer, the upper-PHY factory hooks, and the metric plumbing were present in the codebase, but the BBDEV backend required to dispatch LDPC operations to an Intel ACC100 was not implemented. Every deployment therefore executed LDPC encode and decode on the host CPU, consuming cycles that would otherwise be available to scheduling, MAC processing, or additional cells.

This release closes that gap.

What the release contains

A complete DPDK BBDEV backend for the Intel ACC100, integrated end-to-end with OCUDU’s upper-PHY:

LDPC encode (PDSCH) and LDPC decode (PUSCH) dispatched to the accelerator when configured.
On-chip HARQ with code-block-indexed offset addressing, including the INTERNAL_HARQ_MEMORY_IN/OUT capability flags required by the ACC100 PMD.
Multi-VF scaling via round-robin dispatch across additional BBDEV accelerators, allowing the stack to saturate the device rather than serialise on a single queue.
Batched enqueue and dequeue to amortise MMIO and descriptor-ring overhead.
Runtime path selection. A single YAML field determines whether LDPC runs on the AVX-512 software path or on ACC100. The same binary supports both, so A/B comparison under identical conditions is straightforward.
HWACC metric decorator. The hardware path emits the same ldpc_enc_* and ldpc_dec_* counters latency, throughput, iteration count, CRC status that the software path already populates, so operator dashboards and benchmarking remain directly comparable.

Measured impact

The changes were validated on a 100 MHz TDD cell with a live UE over a 7.2-split RU, with identical gNB configuration aside from the LDPC backend. Representative results:

PDSCH processor throughput: ~57 % higher on offload.
LDPC decode rate: ~2.9× higher on offload.
Uplink upper-PHY CPU utilisation: ~38 % lower, equivalent to approximately one core freed under sustained traffic.
Rate-match and rate-dematch CPU time: reduced to zero (both stages execute on the accelerator).

Full A/B tables, the test environment, and the measurement methodology are documented at docs.ocuduindia.org → Intel ACC100 (LDPC).

Why this matters for the project

LDPC offload is a prerequisite for running OCUDU at realistic cell densities on production-grade hardware. Without it, the stack is constrained either to over-provisioned CPU budgets or to a small number of cells per host a limitation that has been one of the clearest differentiators between OCUDU and mature commercial RAN stacks.

Equally important, the work exercises the existing hardware-abstraction layer end-to-end for the first time. The same interfaces are now load-bearing for any future accelerator backend whether that is Intel ACC200, Nvidia Aerial, or a custom SoC and the integration patterns established here (queue reservation, HARQ context management, metric decoration, factory-level pool sharing) carry over directly.

Availability and next steps

The implementation is available on the OCUDU repository. Configuration details, prerequisites (DPDK, pf_bb_config, VFIO binding, hugepages), and a step-by-step build guide are included in the integration document referenced above.

Feedback, benchmark results from other hardware configurations, and contributions toward additional accelerator backends are welcome via the project’s issue tracker.

OCUDU India