BIND 9 Security

(Part 3 - eBPF - extended Berkeley Packet Filter)

Carsten Strotmann and the ISC Team

Created: 2021-11-16 Tue 18:02

Welcome

Welcome to part three of our BIND 9 security webinar series

In this Webinar

  • The Berkeley Packet Filter
  • eBPF Architecture
  • Instrumenting the Linux Network Stack
  • Instrumenting BIND 9
  • Packet Filtering with eBPF
  • Hands-On lab

The Berkeley Packet Filter

What is BPF/eBPF?

The eBPF idea

  • eBPF allows the administrator to execute sandbox programs inside the operating system kernel
    • eBPF is used to extend the capabilities of the kernel safely, securely and efficiently without modifying the kernel source code or loading kernel modules
    • eBPF can monitor and manipulate network packets as well as other data inside Linux kernel
    • eBPF programs are not kernel modules, you don't need to be a Kernel developer to work with eBPF
      • but some C programming knowledge is helpful

eBPF

ebpf.png

eBPF use cases

  • Use cases for eBPF
    • Network security (advanced firewall functions)
    • Host security
    • Forensics
    • Fault diagnosis
    • Performance measurements
  • eBPF is available on modern Linux systems (Kernel 3.18+) and is currently being ported to the Windows operating systems ported by Microsoft

Origins of BPF

  • The original BSD Packet Filter (BPF) has been designed by Steven McCanne and Van Jacobson at Lawrence Berkeley Laboratory (https://www.tcpdump.org/papers/bpf-usenix93.pdf)
    • BPF has been ported to almost all Unix/Linux and some non-Unix operating systems
    • BPF is the base technology for some well known network sniffing tools such as tcpdump and Wireshark

BPF operation using tcpdump as an example

  • When using a BPF-enabled tool, the filter code is compiled into bytecode for the BPF in-kernel VM and loaded into the kernel
    • The operating system kernel will execute the filter program for every network packet that traverses the network stack
    • Only packets that match the filter expression will be forwarded to the userspace tool, tcpdump in this example
    • BPF helps limiting the amount of data that needs to be sent between kernel and user space

BPF operation using tcpdump as an example

tcpdump can be instructed to emit the source code for a tcpdump filter expression

# tcpdump -d port 53 and host 1.1.1.1
Warning: assuming Ethernet
(000) ldh      [12]
(001) jeq      #0x86dd          jt 19   jf 2
(002) jeq      #0x800           jt 3    jf 19
(003) ldb      [23]
(004) jeq      #0x84            jt 7    jf 5
(005) jeq      #0x6             jt 7    jf 6
(006) jeq      #0x11            jt 7    jf 19
(007) ldh      [20]
(008) jset     #0x1fff          jt 19   jf 9
(009) ldxb     4*([14]&0xf)
(010) ldh      [x + 14]
(011) jeq      #0x35            jt 14   jf 12
(012) ldh      [x + 16]
(013) jeq      #0x35            jt 14   jf 19
(014) ld       [26]
(015) jeq      #0x1010101       jt 18   jf 16
(016) ld       [30]
(017) jeq      #0x1010101       jt 18   jf 19
(018) ret      #262144
(019) ret      #0

eBPF vs. BPF

  • While BPF (or now called cBPF = classic BPF) filters network packets inside the operating system kernel, eBPF does also filter on
    • Kernel systemcalls
    • Kernel tracepoints
    • Kernel functions
    • Userspace tracepoints
    • Userspace functions

eBPF and the Linux kernel

  • The basic eBPF was introduced into the Linux kernel in version 3.18

The eBPF Architecture

The eBPF VM

  • eBPF programs are compiled for a virtual CPU
  • The code is loaded and verified in the Linux kernel
  • On main architectures, the eBPF code is re-compiled into native code (Just in time compiler)

XDP - express data path

  • The express data path (XDP) inside the Linux-Kernel is an infrastructure to gain low level control over network traffic
    • side-stepping the normal kernel network stack flow
    • eBPF programs can be loaded into the eXpress Data Path (XDP)

XDP / eBPF hardware offloading

  • XDP eBPF can be loaded into different level of the Linux kernel network stack
    • Offload XDP: directly into the network hardware (ASIC/FPGA, needs support by the network hardware, for example Netronome NIC)
    • Native XDP: into the network driver (low level Linux kernel code, requires support by the driver)
    • Generic XDP: into the Linux kernel network stack (less performance, but universally available)

XDP / eBPF execution environments

xdp-ebpf-level.png

XDP functions

  • XDP programs can
    • read network packets and collect statistics
    • modify the content of network packets
    • drop selected traffic (firewall)
    • redirect packets to the same or other network interfaces (switching/routing)
    • pass the network packet to the Linux TCP/IP stack for normal processing

XDP vs DDoS attack

  • XDP can discard unwanted traffic very early in the network stack, defending against DDoS attacks

ebpf-xdf-ddos-twitter.png

eBPF/XDP support in DNS software

Using eBPF

eBPF tooling

  • eBPF programs can be written in many ways
    • Low level eBPF assembly code
    • High Level compiler (using LLVM): C / GO / Rust / Lua / Python …
    • Special "scripting" languages: bpftrace

BCC

  • BCC is the BPF compiler collection

BCC tools

bcc_tracing_tools_2019.png

BCC Tool examples (1/2)

  • Count syscalls from the BIND 9 process with syscount
# syscount-bpfcc -p `pgrep named` -i 10
Tracing syscalls, printing top 10... Ctrl+C to quit.
[07:34:19]
SYSCALL                   COUNT
futex                       547
getpid                      121
sendto                      113
read                         56
write                        31
epoll_wait                   31
openat                       23
close                        20
epoll_ctl                    20
recvmsg                      20

BCC Tool examples (2/2)

  • Tracing Linux capability checks
# capable-bpfcc | grep named
07:36:17  0      29378  (named)          24   CAP_SYS_RESOURCE     1
07:36:17  0      29378  (named)          24   CAP_SYS_RESOURCE     1
07:36:17  0      29378  (named)          12   CAP_NET_ADMIN        1
07:36:17  0      29378  (named)          21   CAP_SYS_ADMIN        1
07:36:17  0      29378  named            6    CAP_SETGID           1
07:36:17  0      29378  named            6    CAP_SETGID           1
07:36:17  0      29378  named            7    CAP_SETUID           1
07:36:17  109    29378  named            24   CAP_SYS_RESOURCE     1

bpftrace

  • bpftrace is a little language similar to awk or dtrace
  • bpftrace programs subscribe to eBPF probes and executes a function whenever an event occurs (systemcall, function-call)
  • bpftrace comes with many helper functions to handle eBPF data structures
  • bpftrace allows one to write eBPF programs in a more concise way compared to BCC

Instrumenting the Linux Network Stack

BCC and bpftrace tools

  • Literally hundreds of little eBPF programs exists to look deep into the Linux network stack
    • The BCC example tools
    • The bpftrace examples
    • Examples from eBPF books

gethostlatency

  • The BCC tool gethostlatency measures the latency of client DNS name resolution through function calls such as getaddrinfo or gethostbyname
# gethostlatency-bpfcc
TIME      PID    COMM                  LATms HOST
10:21:58  19183  ping                 143.22 example.org
10:22:18  19184  ssh                    0.03 host.example.de
10:22:18  19184  ssh                   60.59 host.example.de
10:22:35  19185  ping                  23.44 isc.org
10:22:49  19186  ping                4459.72 yahoo.co.kr

netqtop

  • netqtop - Summarize PPS, BPS, average size of packets and packet counts ordered by packet sizes on each queue of a network interface.
# netqtop-bpfcc -n eth0 -i 10
Mon Nov 15 07:43:29 2021
TX
 QueueID    avg_size   [0, 64)    [64, 512)  [512, 2K)  [2K, 16K)  [16K, 64K)
 0          297.82     2          48         1          4          0
 Total      297.82     2          48         1          4          0

RX
 QueueID    avg_size   [0, 64)    [64, 512)  [512, 2K)  [2K, 16K)  [16K, 64K)
 0          70.95      43         34         0          0          0
 Total      70.95      43         34         0          0          0
-----------------------------------------------------------------------------

tcptracer

  • Tracing TCP connections showing source and destination addresses and ports and the TCP state (accept, connect, close)
# tcptracer-bpfcc -p $(pgrep named)
Tracing TCP established connections. Ctrl-C to end.
T  PID    COMM             IP SADDR            DADDR            SPORT  DPORT
C  29404  isc-net-0000     4  127.0.0.1        127.0.0.1        41555  953
A  29378  isc-socket-0     4  127.0.0.1        127.0.0.1        953    41555
X  29404  isc-socket-0     4  127.0.0.1        127.0.0.1        41555  953
X  29378  isc-socket-0     4  127.0.0.1        127.0.0.1        953    41555
C  29378  isc-net-0000     4  46.101.109.138   192.33.4.12      43555  53
C  29378  isc-net-0000     4  46.101.109.138   192.33.4.12      33751  53
X  29378  isc-socket-0     4  46.101.109.138   192.33.4.12      43555  53
X  29378  isc-socket-0     4  46.101.109.138   192.33.4.12      33751  53
C  29378  isc-net-0000     4  46.101.109.138   193.0.14.129     38145  53
C  29378  isc-net-0000     4  46.101.109.138   192.33.14.30     40905  53
X  29378  isc-socket-0     4  46.101.109.138   193.0.14.129     38145  53
X  29378  isc-socket-0     4  46.101.109.138   192.33.14.30     40905  53

tcpconnlat

  • Display the connection latency for outgoing TCP based DNS queries from a BIND 9 resolver (in this example a query for microsoft.com txt, which is too large for 1232 byte UDP)
    • isc-net-0000 is the internal name of a BIND 9 thread
# tcpconnlat-bpfcc
PID    COMM         IP SADDR            DADDR            DPORT LAT(ms)
29378  isc-net-0000 4  46.101.109.138   193.0.14.129     53    37.50
29378  isc-net-0000 4  46.101.109.138   192.52.178.30    53    14.01
29378  isc-net-0000 4  46.101.109.138   199.9.14.201     53    8.48
29378  isc-net-0000 4  46.101.109.138   192.42.93.30     53    1.90
29378  isc-net-0000 4  46.101.109.138   40.90.4.205      53    14.27
29378  isc-net-0000 4  46.101.109.138   199.254.48.1     53    19.21
29378  isc-net-0000 4  46.101.109.138   192.48.79.30     53    7.66
29378  isc-net-0000 4  46.101.109.138   192.41.162.30    53    7.97
29396  isc-net-0000 4  127.0.0.1        127.0.0.1        53    0.06

udplife

  • A bpftrace script to trace UDP session lifespans (DNS round trip time) with connection detail (by Brendan Gregg, see link collection)
# udplife.bt
Attaching 8 probes...
PID   COMM       LADDR           LPORT RADDR           RPORT   TX_B   RX_B MS
29378 isc-net-00 46.101.109.138  0     199.19.57.1     16503     48    420 268
29378 isc-net-00 46.101.109.138  0     51.75.79.143    81        49     43 13
29378 isc-net-00 46.101.109.138  0     199.6.1.52      16452     48    408 24
29378 isc-net-00 46.101.109.138  0     199.249.120.1   81        44     10 9
29378 isc-net-00 46.101.109.138  0     199.254.31.1    32891     64     30 273
29378 isc-net-00 46.101.109.138  0     65.22.6.1       32891     64     46 266

Server agnostic DNS augmentation using eBPF

Instrumenting BIND 9

Use case -> Forward logging

  • A BIND 9 DNS resolver has forward zones configured:
zone "dnslab.org" {
        type forward;
        forwarders { 1.1.1.1; 8.8.8.8; };
};
  • The BIND 9 logging subsystem, while very powerful, does not support the logging of forwarding decisions
  • Goal: Create a bpftrace script that writes out BIND 9 forwarding decisions

Step 1 - Use the force source

  • The BIND 9 source code is public, available on the ISC gitlab service https://gitlab.isc.org
  • A search through the source for forwarding finds the function dns_fwdtable_find in /lib/dns/forward.c. This sounds promising:

dns_fwdtable_find.png

Step 2 - A proof of concept test

  • The function dns_fwdtable_find takes a domain name and returns 0 if the name must be resolved through forwarding, and a value greater than 0 if not
    • A quick bpftrace one-liner will validate that this indeed works:
bpftrace -e 'uretprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find { print(retval) }'

Step 2 - A proof of concept test

forwarding-bpftrace-poc.png

Step 3 - Planning the probe script

  • Now we are certain that we have a function to work with, we write a bpftrace script
  • The script will
    • Store the domain name requested from dns_fwdtable_find when the function is called
    • Check the return code (retval) of the function when it returns, and print the domain name when the return value is zero (0), do nothing otherwise

Challenge - Wrangling with structs

  • The domain name to check for forwarding is given to the function as a struct of type dns_name_t
    • It's not a simple pointer to a string that we can print
  • A search through the ISC BIND 9 source code documentation reveals the structure of dns_name_t. The 2nd field is unsigned char * ndata, which looks like the domain name

Challenge - Wrangling with structs

  • The definition of dns_name_t can be found in lib/dns/include/dns/name.h dns_name_t_definition.png

Challenge - Wrangling with structs

  • bpftrace uses a syntax similar to the C programming language, we can import the struct from the BIND 9 source code into the script
    • we don't need the linked list and the isc_buffer_t fields for our script, and these are not native types, so we comment these lines out
#!/usr/bin/bpftrace

struct dns_name {
        unsigned int   magic;
        unsigned char *ndata;
        unsigned int   length;
        unsigned int   labels;
        unsigned int   attributes;
        unsigned char *offsets;
//      isc_buffer_t  *buffer;
//      ISC_LINK(dns_name_t) link;
//      ISC_LIST(dns_rdataset_t) list;
};
[...]

Printing a message at probe start

  • The BEGIN pseudo-probe fires at the start of the script and prints a message, informing the user that the script has been started
[...]
BEGIN
{
  print("Waiting for forward decision...\n");
}
[...]

Probing the function call

  • This probe fires when the function is called
    • it's a uprobe (User-Space probe)
    • the function to be probed is dns_fwdtable_find in the dynamic library /lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so
    • The 2nd argument to the call (arg1) is cast into a struct dns_name, and then the field ndata is referenced
    • This data is stored into the variable @dns_name[tid] indexed by the thread ID (tid) of the running thread
[...]
uprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find
{
  @dns_name[tid] = ((struct dns_name *)arg1)->ndata
}
[...]

Probing the function exit

  • The 3rd probe is firing at function exit (uretprobe - User-space function return probe)
    • Same library and function as before
  • If the return value of the function is zero 0 (domain name needs to be forwarded), the stored data in @dns_name[tid] is converted into a string and printed out
  • The variable @dns_name[tid] is deleted as it's not needed any longer

uretprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find
{
 if (retval == 0) {
    printf("Forwarded domain name: %s\n", str(@dns_name[tid]));
 }
 delete(@dns_name[tid]);
}

The final script

#!/usr/bin/bpftrace

struct dns_name {
        unsigned int   magic;
        unsigned char *ndata;
        unsigned int   length;
        unsigned int   labels;
        unsigned int   attributes;
        unsigned char *offsets;
//      isc_buffer_t  *buffer;
//      ISC_LINK(dns_name_t) link;
//      ISC_LIST(dns_rdataset_t) list;
};

BEGIN
{
  print("Waiting for forward decision...\n");
}
uprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find
{
  @dns_name[tid] = ((struct dns_name *)arg1)->ndata
}

uretprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find
{
 if (retval == 0) {
    printf("Forwarded domain name: %s\n", str(@dns_name[tid]));
 }
 delete(@dns_name[tid]);
}

The script in operation

  • The script fires whenever a domain name is to be forwarded
    • In this example, all queries for the domain dnslab.org are forwarded, but not ietf.org

bpftrace-script.png

Packet Filtering with eBPF

eBPF as a network firewall

  • eBPF can be a very efficient firewall
    • It can stop network packets before they enter the Linux TCP/IP stack or the userspace application
    • As eBPF runs full programs, the firewall can work on complex rules
      • DNS query names
      • DNSSEC data in answers
      • Source IP of nameserver
      • EDNS data (prioritize DNS messages with DNS cookies)

Example: Block-Non-DNS

  • In the Hands-On part of this training, we show a simple eBPF network filter
    • Block all UDP traffic towards a network interface except DNS (Port 53)
    • Helps in non-DNS DDoS attacks against an authoritative DNS server

Example: XDP Firewall

interface = "eth0";
updatetime = 15;

filters = (
    {
        enabled = true,
        action = 0,
        udp_enabled = true,
        udp_dport = 53
    }
);

Literature and Links

Book: Linux Observability with BPF

By David Calavera, Lorenzo Fontana (November 2019)

book1.png

Book: Systems Performance (2nd ed.)

By Brendan Gregg (December 2020)

book2.jpg

Book: BPF Performance Tools

By Brendan Gregg (December 2019)

book3.jpg

Links

Next webinars

  • December 15 - DNS Fragmentation: Real-World measurements, impact and mitigation

Questions and Answers

Hands-On