eBPF - Understanding How It Works

Abstract

How are eBPF programs written?

In a lot of scenarios, eBPF is not used directly but indirectly via projects like Ciliumbcc, or bpftrace which provide an abstraction on top of eBPF and do not require to write programs directly but instead offer the ability to specify intent-based definitions which are then implemented with eBPF.

wrk

If no higher-level abstraction exists, programs need to be written directly. The Linux kernel expects eBPF programs to be loaded in the form of bytecode. While it is of course possible to write bytecode directly, the more common development practice is to leverage a compiler suite like LLVM to compile pseudo-C code into eBPF bytecode.

Mechanism of eBPF

The eBPF infrastructure was described earlier. These capabilities are implemented collaboratively by multiple components, each with its own complexity.

Anatomy of an eBPF Program

Events and Hooks

Loader

eBPF programs are triggered by events in the kernel. When some specific instructions are executed, these events will be caught at the hook. When the hook is triggered, the eBPF program is executed to capture and manipulate the data. The variety of hook positioning is one of the shining points of eBPF. For example the following:

  • System Call: When a user space program performs a kernel function through a system call.
  • Function entry and exit: Intercept calls before the function exits.
  • Network events: When a packet is received.
  • kprobe and uprobe: Hook into kernel or user functions.

helper function

Helper

Helper functions are called when the eBPF program is triggered. These special functions allow eBPF to have rich functions for accessing memory. For example Helper can perform a series of tasks:

  • Search, update, and delete key-value pairs in the data table.

  • Generate pseudo-random numbers.

  • Collect and tag tunnel metadata.

  • To link the eBPF program, this function is called tail call.

  • Perform socket-related tasks, such as binding, obtaining cookies, and redirecting packets.

These helper functions must be defined by the kernel, in other words, the calling ability of eBPF programs is restricted by a whitelist. The list is long and still growing.

Maps

To store and share data between eBPF programs and the kernel and user space, eBPF requires the use of Maps. As the name suggests, a Map is a key-value pair. Map can support a variety of data structures, and eBPF programs can send and receive data in Map through helper functions.

Maps
The following is an incomplete list of supported map types to give an understanding of the diversity in data structures. For various map types, both a shared and a per-CPU variation is available.

  • Hash tables, Arrays
  • LRU (Least Recently Used)
  • Ring Buffer
  • Stack Trace
  • LPM (Longest Prefix match)

Execute the eBPF program

Load and check


All eBPF programs are executed in bytecode, so there needs to be a way to compile high-level languages into this bytecode. eBPF uses LLVM as the backend, and the frontend can intervene in any language. Because eBPF is written in C, the front end uses Clang. But before the bytecode can be hooked, it must pass a series of checks. Use Kernel Verifier to prevent programs with loops, incorrect permissions, or crashes from running in a virtual machine-like environment. If the program passes all checks, the bytecode is loaded onto the Hook using a bpf() system call.

Verification

The verification step ensures that the eBPF program is safe to run. It validates that the program meets several conditions, for example:
Verif

  • The process loading the eBPF program holds the required capabilities (privileges). Unless unprivileged eBPF is enabled, only privileged processes can load eBPF programs.
  • The program does not crash or otherwise harm the system.
  • The program always runs to completion (i.e. the program does not sit in a loop forever, holding up further processing).

JIT Compilation

The Just-in-Time (JIT) compilation step translates the generic bytecode of the program into the machine specific instruction set to optimize execution speed of the program. This makes eBPF programs run as efficiently as natively compiled kernel code or as code loaded as a kernel module.

Required Privileges

Unless unprivileged eBPF is enabled, all processes that intend to load eBPF programs into the Linux kernel must be running in privileged mode (root) or require the capability CAP_BPF. This means that untrusted programs cannot load eBPF programs.

If unprivileged eBPF is enabled, unprivileged processes can load certain eBPF programs subject to a reduced functionality set and with limited access to the kernel.

Verifier

If a process is allowed to load
an eBPF program, all programs still pass through the eBPF verifier. The eBPF verifier ensures the safety of the program itself. This means, for example:

  • Programs are validated to ensure they always run to completion, e.g. an eBPF program may never block or sit in a loop forever. eBPF programs may contain so called bounded loops but the program is only accepted if the verifier can ensure that the loop contains an exit condition which is guaranteed to become true.
  • Programs may not use any uninitialized variables or access memory out of bounds.
  • Programs must fit within the size requirements of the system. It is not possible to load arbitrarily large eBPF programs.
  • Program must have a finite complexity. The verifier will evaluate all possible execution paths and must be capable of completing the analysis within the limits of the configured upper complexity limit.

Summarize

Putting the above concepts together, the eBPF program inserts hooks after passing the security check. After being triggered by an event, the program starts to execute, and uses auxiliary functions and Maps to store and operate data. Next time we’ll look at how they work together.

Simple Usage Example

On this example, we will use a simple eBPF program already made and functuional used that we can already find on the BPF Compiler Collection.

As I mentioned above, eBPF is used indirectly in most scenarios via projects BCC for example.

In this example we are going to use tcpconnect.py that traces the kernel function performing active TCP connections.

© 2022 - Sofiane Hamlaooui - Making the world a better place 🌎