Architecture
High-level architectural design of the OCUDU 5G gNB layer responsibilities, inter-layer correlation, deployment topology, and the execution and async fabric that binds them.
14 minute read
Target audience: RAN engineers. Scope: high-level architectural design of the OCUDU gNB layer responsibilities, inter-layer correlation, deployment topology, and the execution/async fabric that binds them. Deep algorithmic details (scheduling policies, PHY math, coroutine internals) are deliberately out of scope.
1. What OCUDU is
OCUDU is a full 3GPP/O-RAN compliant 5G NR gNB implemented in C++17/C++20.
It terminates every standardized RAN interface Uu toward the UE (via
PHY/OFH/RU), F1-C/F1-U between CU and DU, E1 between CU-CP and
CU-UP, N2/N3 toward the 5G Core (AMF/UPF), Xn between peer gNBs,
and E2 toward the near-RT RIC. The codebase is functionally
disaggregated so the same binaries can run co-located (gnb) or split
across machines (cu_cp + cu_up + du, with optional du_low for an
O-RAN Split-6 PHY).
Three architectural ideas run top-to-bottom through the code:
- Single logical entity → independent functional units. Every layer
(CU-CP, CU-UP, DU-High, DU-Low, RU) is an owned object tree with a
well-defined public interface and internal adapter notifiers. Layers
never call each other directly; they call adapters, which the assembly
code wires to concrete callees at construction time. This is why the
same DU-High object can be wired to a local in-process CU-CP (via
f1c_local_connector) or to a remote CU-CP (via SCTP) without recompilation. - Async-procedure first. All multi-step control-plane flows (UE
setup, handover, PDU session setup, E1/F1 bearer ops) are modeled as
async_task<R>C++ coroutines composed withCORO_AWAIT_VALUE(...). The procedure classes live underroutines/andprocedures/subdirectories of every protocol layer. - Per-entity executors. Concurrency is expressed as task dispatch
to named
task_executorinstances: per-cell, per-UE-UL, per-UE-DL, per-crypto-worker, per-gateway-IO. Serialization is achieved either by a single-threaded worker or a strand over a shared pool never by coarse locks.
2. High-level topology
flowchart LR
UE((UE))
AMF[/AMF/]
UPF[/UPF/]
RIC[/near-RT RIC/]
PEER[/Peer gNB/]
subgraph gNB
direction LR
subgraph CU[Centralized Unit]
CUCP[CU-CP<br/>RRC · NGAP · XnAP · NRPPa]
CUUP[CU-UP<br/>PDCP · SDAP · GTP-U]
end
subgraph DU[Distributed Unit]
DUH[DU-High<br/>F1AP · MAC · RLC · Scheduler]
DUL[DU-Low<br/>Upper PHY]
end
RU[Radio Unit<br/>Lower PHY / OFH-WG4]
end
UE <-- Uu --> RU
RU <-- OFH/SDR --> DUL
DUL <-- FAPI-like PDU API --> DUH
DUH <-- F1-C sig --> CUCP
DUH <-- F1-U data --> CUUP
CUCP <-- E1 --> CUUP
CUCP <-- N2 SCTP --> AMF
CUUP <-- N3 GTP-U --> UPF
CUCP <-- Xn --> PEER
CUCP <-- E2 --> RIC
DUH <-- E2 --> RIC
CUUP <-- E2 --> RICEach functional block is a standalone compilation unit under lib/,
wrapped by an application unit under apps/units/ that adds YAML
config, logging registration, metrics, and PCAP plumbing. The gnb
binary composes all three app units in one process; the split binaries
compose only their own unit and use SCTP/UDP gateways for cross-entity
links.
3. CU-CP Centralized Unit, Control Plane
Source: lib/cu_cp/, lib/ngap/, lib/f1ap/cu_cp/, lib/e1ap/cu_cp/,
lib/rrc/, lib/xnap/, lib/nrppa/.
CU-CP is the RRC/NGAP termination point and the orchestrator of every
per-UE control procedure. The owning class is cu_cp_impl
(lib/cu_cp/cu_cp_impl.h); it aggregates four repositories one each
for DUs, CU-UPs, AMFs, and Xn peers plus a ue_manager, a
mobility_manager, a cell_meas_manager, and an nrppa_entity.
graph TD
CUCP[cu_cp_impl]
UEM[ue_manager<br/>cu_cp_ue per ue_index]
DUR[du_processor_repository]
CUR[cu_up_processor_repository]
NGR[ngap_repository]
XNR[xnap_repository]
MOB[mobility_manager]
MEAS[cell_meas_manager]
CUCP --> UEM
CUCP --> DUR --> DUP[du_processor_impl<br/>owns F1AP + RRC-DU]
CUCP --> CUR --> CUUPP[cu_up_processor_impl<br/>owns E1AP]
CUCP --> NGR --> NGAP[ngap_impl<br/>per AMF]
CUCP --> XNR
CUCP --> MOB
CUCP --> MEASInterfaces terminated. NGAP (TS 38.413) on N2, F1AP (TS 38.473) on
F1-C, E1AP (TS 38.463) on E1, XnAP (TS 38.423 / 37.483) on Xn-C, plus
NRPPa for positioning and E2AP for near-RT RIC. Each protocol has its
own state machine in its lib/ directory and exposes an adapter
interface back into cu_cp_impl.
UE lifecycle. The canonical Initial UE Message flow runs like this:
- DU sends F1AP Initial UL RRC Message Transfer →
du_processor_implallocates acu_cp_ueviaue_manager::add_ue()and binds F1AP/RRC adapters. - NGAP forwards a NAS Initial UE Message to the AMF and establishes an NGAP UE context.
- AMF responds with Initial Context Setup Request, which launches
initial_context_setup_routinea coroutine that sequentially awaits: Security Mode Command on RRC → F1AP UE Context Setup → UE Capability Transfer → nested pdu_session_resource_setup_routine (E1AP Bearer Context Setup → F1AP UE Context Modification → RRC Reconfiguration).
Every step is a CORO_AWAIT_VALUE on the next async sub-procedure, so
the routine reads like synchronous pseudocode but never blocks a thread.
Concurrency model. CU-CP runs on a single cu_cp_executor. A
CU-CP-wide FIFO (cu_cp_common_task_scheduler) orders global tasks;
each UE has its own FIFO (ue_task_scheduler_impl) so per-UE procedures
serialize without blocking unrelated UEs. AMF connections get their own
FIFO per NGAP instance. The result is fine-grained serialization without
a single global lock.
Mobility. mobility_manager inspects measurement reports from
cell_meas_manager and dispatches to one of three paths: intra-CU,
inter-CU via Xn, or inter-CU via NG (AMF-routed). Conditional Handover
has its own state machine in cu_cp_ue_cho_context. All three paths
share a common coroutine skeleton under lib/cu_cp/routines/.
4. CU-UP Centralized Unit, User Plane
Source: lib/cu_up/, lib/pdcp/, lib/sdap/, lib/gtpu/,
lib/f1u/cu_up/, lib/e1ap/cu_up/.
CU-UP terminates N3 (GTP-U to UPF) and F1-U (NR-U to DU) and implements the PDCP/SDAP layers in between. The E1AP interface receives Bearer Context Setup/Modify/Release from CU-CP and materializes the per-UE object tree.
flowchart LR
subgraph UE_CTX[Per-UE context]
direction TB
PDUs[pdu_session] --> DRB[drb_context]
DRB --> QF[qos_flow_context]
end
subgraph N3[N3 UPF]
NG[gtpu_tunnel_ngu_rx/tx]
end
subgraph F1U[F1-U DU]
FB[f1u_bearer_impl<br/>NR-U DDDS]
end
NG -->|TEID demux| SDAPT[sdap_entity_tx<br/>QFI mark]
SDAPT --> PDCPT[pdcp_entity_tx<br/>cipher · integrity · SN]
PDCPT --> FB
FB --> PDCPR[pdcp_entity_rx<br/>decipher · reorder]
PDCPR --> SDAPR[sdap_entity_rx<br/>QFI strip]
SDAPR --> NGObject hierarchy. A pdu_session owns its N3 GTP-U tunnel, an SDAP
entity, and a map of drb_context entries. Each DRB owns a PDCP entity,
an F1-U CU-UP bearer, and a map of qos_flow_context entries. TEIDs are
allocated from n3_teid_allocator and f1u_teid_allocator pools.
PDCP. TX maintains TX_NEXT, TX_TRANS_CRYPTO, TX_REORD_CRYPTO,
TX_TRANS, TX_NEXT_ACK (TS 38.323 §7.1). Ciphering and integrity run
on a parallel crypto_executor pool; custom state variables track
in-flight crypto operations so PDUs can be emitted in the correct order
even when parallel workers finish out-of-sequence. RX implements the
reordering window and the t-Reordering timer per TS 38.323 §5.2.2.2.
F1-U / NR-U. f1u_bearer_impl consumes NR-U data delivery status
messages from the DU and feeds handle_transmit_notification() /
handle_delivery_notification() into PDCP TX, which advances
TX_NEXT_ACK and releases discard-timer slots. The DU also reports
desired buffer size; PDCP TX uses it as a back-pressure signal for
early drop.
Concurrency. Every UE is assigned four executors by the
ue_executor_mapper: ctrl (E1AP), ul_pdu (F1-U RX), dl_pdu (N3
RX), crypto (pooled). GTP-U demux per-TEID dispatches PDUs onto the
owning session’s dl_pdu executor in batches, so a single UE’s data
path is serialized while different UEs run in parallel on different
workers.
5. DU-High MAC, RLC, F1AP, DU manager
Source: lib/du/du_high/, lib/mac/, lib/rlc/, lib/f1ap/du/.
DU-High is orchestrated by du_manager_impl, which owns the cell and UE
context repositories and reacts to three event streams: F1AP procedures
from CU-CP (UE Context Setup/Modify/Release per TS 38.473), MAC
indications (UL-CCCH from Msg3, C-RNTI CE on handover access), and
operator reconfig from the app-level configurator.
flowchart TB
F1AP[f1ap_du_impl<br/>TS 38.473 procedures] --> DMGR[du_manager_impl<br/>UE lifecycle orchestration]
DMGR --> MACC[mac_controller<br/>UE ctx · RNTI]
DMGR --> RLCF[rlc factory]
MACC --> MDL[mac_dl_processor<br/>PDSCH / SIB / RAR / Paging assembler]
MACC --> MUL[mac_ul_processor<br/>PUSCH demux · BSR · PHR · CRC]
MDL <-->|slot_result| SCHED[MAC Scheduler]
MUL --> SCHED
MDL -->|pull_pdu LCID| RLCTX[rlc_tx]
MUL -->|handle_pdu LCID| RLCRX[rlc_rx]MAC split. mac_dl_processor per cell runs on a high-priority
slot_ind_executor; on each slot it calls the scheduler’s
get_slot_result() and then pulls PDUs from RLC TX entities for each
granted logical channel. mac_ul_processor receives Rx_Data
indications, routes by C-RNTI via rnti_manager (lock-free atomic
allocator starting at MIN_CRNTI = 0x4601, TS 38.321 §7.1),
demultiplexes MAC subPDUs, feeds BSR/PHR to the scheduler, and hands
LCID payloads to RLC RX.
RLC. Three modes exist: TM (SRB0, passthrough), UM (SRBs, no ARQ),
AM (DRBs, full ARQ). AM TX tracks TX_NEXT_ACK, TX_NEXT, POLL_SN
plus byte/PDU poll counters; AM RX runs a reassembly window keyed on
RX_NEXT with a t-Reassembly timer that generates STATUS PDUs on gap
or timeout (TS 38.322 §5.2, §5.3.3). The SDU queue between PDCP and RLC
is a lock-free SPSC this is the reason the slot-indication hot path
can pull PDUs without blocking on RLC state updates happening on the UE
executor.
F1AP-DU. f1ap_du_impl decodes F1AP ASN.1, dispatches to
per-procedure coroutines (F1 Setup, UE Context Setup/Modify/Release,
DL/UL RRC Message Transfer, Paging), and forwards RRC containers to the
right RLC SRB via adapters in
lib/du/du_high/du_manager/du_ue/du_ue_adapters.h.
Adapter pattern. DU-High never calls F1AP or PDCP directly; it holds
a set of small adapter classes (f1c_rx_sdu_rlc_adapter,
rlc_rx_rrc_sdu_adapter, mac_sdu_tx_builder, mac_sdu_rx_notifier)
whose targets are set at UE-creation time. This is what lets the same
DU-High binary work with an in-process F1-C connector or a remote SCTP
F1-C without code changes.
6. MAC scheduler
Source: lib/scheduler/.
The scheduler is the most intricate subsystem. scheduler_impl owns one
cell_scheduler per cell and one ue_scheduler per carrier-aggregation
cell group this split is deliberate: cell-wide resources (SSB,
PRACH, SI, CSI-RS, PUCCH format resources) are per-cell state, while UE
data state must be shared across CA component carriers.
flowchart TD
SI[scheduler_impl::slot_indication] --> CS[cell_scheduler::run_slot]
CS --> RG[cell_resource_allocator<br/>ring buffer, ~16 slots]
CS --> SSB[ssb_sch]
CS --> CSIRS[csi_rs_sch]
CS --> SIS[si_sch<br/>SIB1 + SI msgs]
CS --> PR[prach_sch]
CS --> RA[ra_scheduler<br/>RAR · Msg3 · Msg4]
CS --> PG[paging_sch]
CS --> US[ue_scheduler::run_slot]
US --> EV[event_manager<br/>config · feedback]
US --> UCI[uci_scheduler<br/>SR · CSI PUCCH]
US --> SRS[srs_scheduler]
US --> FB[fallback_sched<br/>SRB0]
US --> INTER[inter_slice_scheduler]
INTER --> INTRA[intra_slice_scheduler]
INTRA --> POL[scheduler_policy<br/>time_rr · time_qos]Resource grid. cell_resource_allocator is a circular buffer of
per-slot cell_slot_resource_allocator entries, sized for
SCHEDULER_MAX_K0 / K1 / K2 look-ahead. Each entry contains symbol
× CRB bitmaps for DL and UL and the accumulated sched_result handed
to MAC.
Per-UE state split. ue is cell-group-wide (logical channels, DRX,
timing advance). ue_cell is per-cell (active BWP, HARQ entities, MCS
calculator, power controllers, fallback flag). A UE in CA has one ue
and several ue_cell views indexed by serving cell index (PCell = 0).
Slicing. Two layers: inter_slice_scheduler ranks RAN slices by
SLA/min-PRB/max-PRB each slot and produces DL/UL candidates;
intra_slice_scheduler then applies a pluggable scheduler_policy
(time-domain Round-Robin or Proportional-Fair implemented in
lib/scheduler/policy/) to rank UEs within a slice and allocate
PDSCH/PUSCH. Fallback UEs on SRB0 use a dedicated
ue_fallback_scheduler instead.
HARQ. Each UE has 8 DL + 8 UL HARQ processes per serving cell
(cell_harq_manager), tracked by NDI toggling, a bounded
max_nof_harq_retxs, and a slot_timeout for missed ACKs.
Config safety. sched_config_manager converts add/update/remove UE
requests into ue_config_update_event objects applied at slot
boundaries by the event manager. Config never changes mid-slot.
Concurrency. One slot runs on one thread per cell; in CA,
cell_group_mutex is taken only when the cell group has more than one
cell, so single-cell deployments pay zero lock cost.
7. DU-Low, PHY, RU, Open Fronthaul
Source: lib/du/du_low/, lib/phy/, lib/ru/, lib/ofh/,
lib/radio/.
DU-Low implements only the Upper PHY lib/du/du_low/README.md
states the DU-Low is O-RAN Split 7.2x aligned, with the Lower PHY pushed
into the RU. The radio_unit interface (include/ocudu/ru/ru.h) has
three concrete implementations: OFH-RU (O-RAN fronthaul,
production), SDR-RU (direct baseband via UHD/ZMQ, which pulls Lower
PHY back into the host), and Dummy-RU (loopback for testing).
flowchart LR
MAC[MAC / Scheduler]
subgraph DU_LOW[DU-Low · Upper PHY]
DLPOOL[downlink_processor_pool]
ULPOOL[uplink_processor_pool]
DLPOOL --> PDSCH[PDSCH proc<br/>LDPC · mod]
DLPOOL --> PDCCH[PDCCH proc<br/>polar]
DLPOOL --> SSB[SSB proc]
ULPOOL --> PUSCH[PUSCH proc]
ULPOOL --> PUCCH[PUCCH proc]
ULPOOL --> PRACH[PRACH det]
end
subgraph RU[Radio Unit]
LP[Lower PHY<br/>OFDM · FFT · CP]
OFH[OFH tx/rx<br/>eCPRI · BFP · WG4]
end
MAC -->|PDU API| DLPOOL
ULPOOL -->|UL ind| MAC
DLPOOL -->|DL grid| LP
LP -->|UL grid| ULPOOL
LP <--> OFH
OFH <--> WIRE((Fronthaul Ethernet))Upper PHY. Drives LDPC (base-graph 1/2 per TS 38.212 §5.3.2, with
AVX2/AVX512/NEON kernels), polar coding for control, CRC (LUT or
CLMUL), scrambling (Gold sequence per TS 38.211 §5.2.1), modulation
mapping up to 256QAM. Channels are objects: pdsch_processor,
pdcch_processor, ssb_processor, csi_rs_generator,
pusch_processor, pucch_processor, prach_detector. For
hardware-offload of LDPC, see
Hardware Acceleration → Intel ACC100 (LDPC).
Lower PHY (when present via SDR path). OFDM modulator/demodulator with pluggable DFT backends FFTW, AMD FFTZ/AOCL, ARM Performance Library, generic Cooley-Tukey. CP length selection follows TS 38.211; phase compensation is precomputed via LUT.
Open Fronthaul (WG4 CUS). ofh_sector encapsulates one OFH logical
antenna array. The transmitter encodes C-plane section types (1 DL/UL
data, 3 PRACH), the U-plane packer compresses IQ via Block Floating
Point (O-RAN.WG4.CUS Annex A.1.2, with SIMD kernels), and eCPRI framing
produces Ethernet-ready packets. The receiver reverses this, with an
rx_window_checker that rejects packets outside the RX window and a
symbol reorderer that re-sequences out-of-order U-plane traffic. Timing
is driven by realtime_timing_worker against CLOCK_REALTIME
(PTP-disciplined in production) it emits OTA symbol boundaries to
which transmitter, receiver, and DU-Low subscribe.
DPDK integration. Ethernet TX/RX under lib/ofh/ethernet/dpdk/ uses
busy-polling on dedicated lcores, selected by CPU affinity in the
worker manager.
8. Cross-cutting infrastructure
Source: include/ocudu/support/, include/ocudu/adt/, lib/gateways/,
lib/ocudulog/.
Async. async_task<R> is a C++20 stackless coroutine;
async_procedure<R> is a non-coroutine fallback with the same awaitable
shape. event_signal and manual_event are the awaitable primitives
used to park a coroutine until PHY/peer response arrives.
protocol_transaction_manager wraps the transaction-ID + timeout
pattern every ASN.1 protocol needs.
Executors. The task_executor interface has a zoo of
implementations: inline_task_executor (test),
general_task_worker_executor (one thread, policy-driven queue),
priority_task_worker_executor (multi-priority), strand_executor
(serialize over a shared pool using atomic job count),
sync_task_executor (block until done). All tasks are
unique_function<void(), 64> a 64-byte small-buffer-optimized
closure, no heap allocation for typical lambdas.
Queues. Lock-free SPSC (rigtorp) and MPMC (rigtorp) underpin the data path; locking MPSC/MPMC variants exist for cold paths. The SPSC RLC SDU queue is the reason the slot-indication pull path is non-blocking.
byte_buffer. A segmented, reference-counted zero-copy buffer backed by a thread-local segment pool. Slicing produces views without copying; every data-path handoff moves buffers by reference-count bump.
Timers. timer_manager is a flat tick-driven timer service
tick() is called once per ms and expired callbacks are dispatched to
the per-timer executor. async_wait_for() wraps a timer as an
awaitable, which is how PDCP t-Reordering, RLC t-PollRetransmit, and RA
contention-resolution timers integrate with the coroutine model.
Gateways. sctp_network_server_impl / sctp_network_client_impl
for N2/F1-C/E1/Xn; udp_network_gateway_impl for N3/F1-U; io_broker
(epoll) manages socket FDs and dispatches events to executors. Every
gateway takes an executor reference so RX callbacks run where the
protocol layer expects them.
Logging & tracing. ocudulog is an async log framework with
per-channel levels, pluggable sinks (file/stdout/syslog/UDP), and
formatter classes. l1_dl_tracer, l1_ul_tracer, l2_tracer emit
compile-time-gated binary trace events for latency analysis; Tracy
integration is optional.
9. Deployment topologies and wiring
OCUDU compiles four binaries gnb, cu_cp, cu_up, du plus
du_low for Split-6. The same lib/ code powers all of them; the
difference is which app units the binary composes and which gateway
factories it picks.
flowchart LR
subgraph gnb[Co-located gnb binary]
CC1[CU-CP] -- local --- CU1[CU-UP]
CC1 -- local --- D1[DU]
CU1 -- local --- D1
end
subgraph split[Split CU/DU]
CC2[cu_cp binary] -- SCTP/F1-C --- D2[du binary]
CC2 -- SCTP/E1 --- CU2[cu_up binary]
CU2 -- UDP/F1-U --- D2
endSelection rule. gnb.cpp instantiates f1c_local_connector,
e1_local_connector, f1u_local_connector (zero-copy in-process
queues). The split binaries instantiate SCTP servers/clients and a UDP
gateway instead. The application units and lib/ code are identical in
both paths only the connector factory differs. Recent commits added
full SCTP socket-parameter plumbing (RTO, heartbeat, retransmission)
into F1 and E1 config so operators can tune transport per deployment.
App units (apps/units/) o_cu_cp, o_cu_up, flexible_o_du
provide a uniform interface (application_unit) covering YAML schema
registration, logger setup, worker-manager contribution (CPU affinity,
NUMA, pool sizes), PCAP plumbing, and metrics aggregation. This is how
a single gnb binary cleanly composes three functional entities with
shared workers and a single buffer pool.
Worker manager and buffer pool. One worker_manager sizes and pins
every executor thread per YAML-declared affinities. One
buffer_pool_manager provides the byte_buffer segment pool that every
layer uses no layer allocates its own heap in the data path.
Remote control and metrics. An optional uWebSockets-backed
remote_server exposes JSON commands (UE dump, cell start/stop,
metrics query). A central metrics_manager aggregates producers from
every layer and fans them out to configurable sinks (log, stdout, JSON,
file) on a periodic tick.
10. Layer correlation summary
| Plane | CU-CP | CU-UP | DU-High | DU-Low / RU |
|---|---|---|---|---|
| L3 / NAS | RRC, NGAP, XnAP, NRPPa | |||
| L2 Ctrl | F1AP-CU, E1AP-CU | E1AP-CU-UP | F1AP-DU, MAC Ctrl | |
| L2 Data | PDCP, SDAP, GTP-U, F1-U CU | RLC, MAC DL/UL, Sched | ||
| L1 | FAPI-like PDU API | Upper PHY (DU-Low), Lower PHY + OFH (RU) | ||
| Transport | SCTP (N2/F1-C/E1/Xn) | UDP (N3/F1-U) | SCTP (F1-C), UDP (F1-U) | eCPRI/Ethernet (OFH) or UHD/ZMQ (SDR) |
The control plane forms a chain CU-CP → DU-High via F1-C, with E1 as a side-link to CU-UP for bearer context. The user plane forms an independent pipe UPF ↔ CU-UP ↔ DU-High ↔ RU entirely outside CU-CP’s hot path. The coroutine-based procedure framework cuts across every control-plane layer uniformly, so a flow like PDU Session Setup reads as a single linear routine even though it straddles NGAP, E1AP, F1AP, and RRC.
Further reading
- Concrete operator guide that exercises Upper PHY end-to-end: Hardware Acceleration → Intel ACC100 (LDPC).
- Deployment blueprints (single edge site, multi-DU aggregation, regional GitOps): Deployment.
- How to propose an architectural change via the RFC process: Contributing.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.