
(Part 1 - Event based BIND 9 log- and system analysis)
Created: 2021-02-17 Wed 18:40


Welcome to part one of our BIND 9 webinar series


atop




categories define what should be written to the logschannel define where the log data should be writtencategories can be written into the same channelcategory can be written to multiple channels









xfer-in, xfer-out, notify or update and
update-securityprint-category yes; to
be able to filter for the categories in combined log files

print-severity yes; BIND 9 will print
the logging severity in the log file
2021-02-17T11:27:36.593 general: info: reloading configuration succeeded 2021-02-17T11:27:36.600 general: info: reloading zones succeeded 2021-02-17T11:27:37.006 general: notice: all zones loaded 2021-02-17T11:27:37.100 general: notice: running 2021-02-17T11:27:43.929 query-errors: info: client @0x5dbab4d4 ::1#63099 (fail05.dnssec.works): query failed (zone not loaded) for fail05.dnssec.works/IN/SOA at query.c:5448 2021-02-17T11:27:49.962 query-errors: info: client @0x5dbab4d4 ::1#59273 (fail05.dnssec.works): query failed (zone not loaded) for fail05.dnssec.works/IN/SOA at query.c:5448 2021-02-17T11:27:50.073 general: info: zone fail05.dnssec.works/IN: refresh: unexpected rcode (SERVFAIL) from master ::1#53 (source ::#0)


17-Feb-2021 11:56:55.311 client @0x7f4fb5f3cd40 2001:db8:ff:1a00:95e8:d30:c8a9:6ced#62678 (31b4bd31fg1x2.virus): rpz QNAME PASSTHRU rewrite 31b4bd31fg1x2.virus via 31b4bd31fg1x2.virus.rpz.zone 17-Feb-2021 11:57:24.663 client @0x7f4fcd2379b0 2001:db8:561:5:7935:d587:336b:436e#17126 (malware.com): rpz QNAME PASSTHRU rewrite malware.com via malware.com.malware.rpz.zone 17-Feb-2021 11:57:24.840 client @0x7f4fc5a66560 2001:db8:4212:0:11da:b4a5:b77e:31dc#45577 (trojan.biz): rpz QNAME PASSTHRU rewrite trojan.biz via trojan.biz.tojan.rpz.zone 17-Feb-2021 11:57:41.814 client @0x7f4f51668910 1002:db8:1121:5:7935:d587:336b:436e#4460 (badstuff.com): rpz QNAME PASSTHRU rewrite badstuff.com via badstuff.com.rpz.zone




2021-01-18T15:13:01.438 client @0x413ce4d4 192.0.2.4#13700 (blog.defaultroutes.de): \
query: blog.defaultroutes.de IN AAAA - (5.45.109.212)


options {
directory "/var/named";
querylog no;
[...]
};






/var/log or the place where your BIND 9
server writes the log files)

logrotate can compress logfiles to save space
online
compression which can speed up I/O performance of logfile
operations







stdout and
stderr channels of the BIND 9 process and store that information
into the Journal
stdout or stderr





grep, awk and perl









$ ./logeater-queries -h Usage of ./logeater-queries: -c list query network classes -d list domain names -i list query IP addresses -n no reverse IP lookup -s list statistics -t list query type


$ cat query.log | ./logeater-queries -c -t | column -t Query-Network-Classes 21379 : IN 6 : CH Query-Network-Types 8514 : A 4060 : AAAA 3079 : SOA 2372 : DNSKEY 927 : PTR 658 : MX 543 : NS 312 : DS 286 : TXT 186 : NSEC 129 : ANY 115 : CNAME […]


logeater-queries Example - printing the top ten query IP addresses
with reverse name resolution:$ cat query.log | ./logeater-queries -i | head | column -t Query-IP-Addresses 1571 : 212.114.206.217 [muc.example.de.] 821 : 2620:74:13:4400::41 [dnsviz-db.verisignlabs.com.] 794 : 72.13.58.112 [dnsviz-db.verisignlabs.com.] 704 : 54.234.42.241 [241.compute-1.amazonaws.com.] 682 : 2001:19f0:5001:df:76d7:5703:ba0a:e220 [] 565 : 185.92.221.212 [185.92.221.212.vultr.com.] 467 : 185.22.143.29 [b9168f1d.cgn.dg-w.de.] 314 : 91.51.184.46 [3b82e.dip0.t-ipconnect.de.]


logeater-dnssec analyses the a log file with messages from the
DNSSEC category and groups the error messages :$ cat dnssec.log | ./logeater-dnssec | head 8727 : 0C9F6LGOE6NADAS8KG1CLIK9UO9G7EIG.ad/NSEC3: no valid signature found 6953 : ad/SOA: no valid signature found 3976 : sat-tv.com/A: got insecure response; parent indicates it should be secure 1730 : mozilla.com/SOA: no valid signature found 1586 : stream.bestvideostreaming.is/A: no valid signature found 1577 : 8FC1DQ3C2Q3ERFD4UO40ENDBTSFME5JO5.ad/NSEC3: no valid signature found 1576 : sat-tv.com/SOA: got insecure response; parent indicates it should be secure 1576 : cdws.eu-west-1.amazonaws.com.Cisco/AAAA: bad cache hit (amazonaws.com.Cisco/DS) 1483 : 0c9f6lgoe6n13ad9iu1clik9uo9g7eig.ad/NSEC3: no valid signature found 968 : cbr.de/NSEC: no valid signature found


logeater-resolver analyses the a log file with messages from the
resolver category and groups the error messages :$ cat resolvers.log | ./logeater-resolvers | head 42908 : s-cnc1.qq.com/AAAA: Name qq.com (SOA) not subdomain of zone ns-cnc1.qq.com -- invalid response 42713 : s-tel1.qq.com/AAAA: Name qq.com (SOA) not subdomain of zone ns-tel1.qq.com -- invalid response 42484 : s-os1.qq.com/AAAA: Name qq.com (SOA) not subdomain of zone ns-os1.qq.com -- invalid response 42297 : s-cmn1.qq.com/AAAA: Name qq.com (SOA) not subdomain of zone ns-cmn1.qq.com -- invalid response 20346 : mails.sonymusicfans.com/DS: invalid response 10920 : tp1.glb.nist.gov/DS: invalid response 9693 : media.netd.com.tr/AAAA for client 192.0.2.165#3347: Name netd.com.tr (SOA) not subdomain of zone media.netd.com.tr -- invalid response 7932 : service.superc.net/AAAA for client 192.0.2.11#3073: Name superc.net (SOA) not subdomain of zone service.superc.net — invalid response 4597 : brickleonavon.com/NS for client 192.0.2.46#3073: Name . (SOA) not subdomain of zone brickleonavon.com -- invalid response 4474 : promo.mobile.de/AAAA for client 2001:db8:1800:88:78f9:ba4:45fe:d438#48296: Name mobile.de (SOA) not subdomain of zone promo.mobile.de -- invalid response






tcpdump pcap files),
I often generate histograms over the logfiles to find hot-spots or
outliers

cat dnssec.log | grep "2021-02-17" | awk '{print substr($0,0,16)}' \
| uniq -c | awk '{printf("\n%s ",$0) ; for (i = 0; i<$1/100 ; i++) \
{printf("*")};}' | less


178 27-Apr-2020 18:09 ** 159 27-Apr-2020 18:10 ** 160 27-Apr-2020 18:11 ** 169 27-Apr-2020 18:12 ** 175 27-Apr-2020 18:13 ** 166 27-Apr-2020 18:14 ** 183 27-Apr-2020 18:15 ** 224 27-Apr-2020 18:16 *** 4082 27-Apr-2020 18:17 ***************************************** 4439 27-Apr-2020 18:18 ********************************************* 3476 27-Apr-2020 18:19 *********************************** 3212 27-Apr-2020 18:20 ********************************* 2484 27-Apr-2020 18:21 ************************* 2201 27-Apr-2020 18:22 *********************** 2036 27-Apr-2020 18:23 ********************* 2032 27-Apr-2020 18:24 ********************* 2059 27-Apr-2020 18:25 ********************* 1679 27-Apr-2020 18:26 ***************** 943 27-Apr-2020 18:27 ********** 639 27-Apr-2020 18:28 ******* 500 27-Apr-2020 18:29 ***** 414 27-Apr-2020 18:30 *****


security.log based on
the number of recurring messages
cat security.log | grep "2021-02-14" | awk -F : '{print substr($4,0,60)}' | \
uniq -c | sort -nr | \
awk '{printf("\n%s ",$0) ; for (i = 0; i<$1/10 ; i++) {printf("*")};}'


100 query (cache) 'www.google.com/A/IN' denied ********** 10 query (cache) 'peacecorps.gov/ANY/IN' denied * 2 query (cache) 'com/ANY/IN' denied * 2 query (cache) '67b.org/AAAA/IN' denied * 1 query (cache) 'www.qq.com/A/IN' denied * 1 query (cache) 'www.qq.com/A/IN' denied * 1 query (cache) 'researchscan541.eecs.umich.edu/A/IN' denied * 1 query (cache) 'pizzaseo.com/RRSIG/IN' denied * 1 query (cache) 'invalid.parrotdns.com/A/IN' denied * 1 query (cache) 'dnsscan.shadowserver.org/A/IN' denied * 1 query (cache) 'cpsc.gov/ANY/IN' denied * 1 query (cache) 'c.afekv.com/A/IN' denied * 1 query (cache) 'amazon.com/A/IN' denied * 1 query (cache) '5-45-109-212-60287611.spiderprobe.com/A/IN' * 1 query (cache) '37-120-183-122-60287611.spiderprobe.com/A/IN * 1 query (cache) '212.109.45.5.in-addr.arpa/PTR/IN' denied * 1 query (cache) '122.183.120.37.in-addr.arpa/PTR/IN' denied *




atop is an real-time system monitor. It is an alternative to the
original Unix/Linux top tool


ATOP - dns-resolver-01 2021/02/15 13:39:41 -------------- 10s elapsed
PRC | sys 0.45s | user 61.38s | #proc 261 | #trun 5 | #tslpi 299 | #tslpu 0 | #zombie 0 | #exit 3 |
CPU | sys 180% | user 446% | irq 34% | idle 1342% | wait 0% | guest 0% | curf 1.78GHz | curscal 54% |
cpu | sys 16% | user 48% | irq 0% | idle 35% | cpu005 w 0% | guest 0% | curf 2.42GHz | curscal 73% |
cpu | sys 16% | user 48% | irq 0% | idle 36% | cpu018 w 0% | guest 0% | curf 2.82GHz | curscal 85% |
cpu | sys 15% | user 45% | irq 0% | idle 40% | cpu007 w 0% | guest 0% | curf 1.30GHz | curscal 39% |
cpu | sys 17% | user 44% | irq 0% | idle 40% | cpu015 w 0% | guest 0% | curf 2.26GHz | curscal 68% |
cpu | sys 15% | user 37% | irq 0% | idle 48% | cpu009 w 0% | guest 0% | curf 1.20GHz | curscal 36% |
cpu | sys 12% | user 25% | irq 0% | idle 62% | cpu003 w 0% | guest 0% | curf 1.20GHz | curscal 36% |
cpu | sys 10% | user 26% | irq 1% | idle 64% | cpu001 w 0% | guest 0% | curf 1.20GHz | curscal 36% |
CPL | avg1 6.60 | avg5 6.22 | avg15 6.21 | | csw 1988779 | intr 1440528 | | numcpu 20 |
MEM | tot 62.8G | free 24.4G | cache 29.3G | buff 1.0G | slab 1.4G | shmem 3.1M | vmbal 0.0M | hptot 0.0M |
SWP | tot 8.0G | free 8.0G | | | | | vmcom 2.4G | vmlim 39.4G |
DSK | sda | busy 0% | read 0 | write 201 | KiB/w 11 | MBr/s 0.0 | MBw/s 0.2 | avio 0.00 ms |
NET | transport | tcpi 1889 | tcpo 1538 | udpi 764422 | udpo 768448 | tcpao 66 | tcppo 250 | tcprs 15 |
NET | network | ipi 767687 | ipo 770138 | ipfrw 0 | deliv 767685 | | icmpi 1372 | icmpo 6 |
NET | eno49 1% | pcki 791200 | pcko 793659 | sp 10 Gbps | si 75 Mbps | so 123 Mbps | erri 0 | erro 0 |
NET | ens2f3 0% | pcki 18 | pcko 28 | sp 1000 Mbps | si 0 Kbps | so 7 Kbps | erri 0 | erro 0 |
NET | lo ---- | pcki 20 | pcko 20 | sp 0 Mbps | si 1 Kbps | so 1 Kbps | erri 0 | erro 0 |
PID SYSCPU USRCPU VGROW RGROW RUID EUID ST EXC THR S CPUNR CPU CMD 1/7
2992 0.00s 61.37s 0K 0K named named -- - 11 S 9 638% named
1140 0.19s 0.00s 0K 0K root root -- - 1 S 13 2% haveged
16622 0.03s 0.00s 9324K 1240K user staff -- - 1 R 0 0% atop
9 0.02s 0.00s 0K 0K root root -- - 1 I 15 0% rcu_sched


atop can be run as a daemon process in the background
atop collects the system information every 60
seconds into the (binary) file /var/log/atop/atop_20210215 (rotated daily)
# systemctl status atop
● atop.service - Atop advanced performance monitor
Loaded: loaded (/lib/systemd/system/atop.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2021-02-15 00:00:02 CET; 13h ago
Docs: man:atop(1)
Main PID: 10509 (atop)
Tasks: 1 (limit: 4915)
CGroup: /system.slice/atop.service
└─10509 /usr/bin/atop -R -w /var/log/atop/atop_20210215 60
Feb 15 00:00:02 dns-resolver-12 systemd[1]: Started Atop advanced performance monitor.


atop can read the (binary) historical data
atop -r /var/log/atop/atop_20210215






tcpdump once a threshold is reached


/etc/systemd/system/tcpdump-starter.timer: this systemd unit
starts the tcpdump-starter service every 60 seconds after the
last termination of the same service[Unit] Description=TCPDUMP Starter Timer [Timer] OnUnitInactiveSec=60s [Install] WantedBy=timers.target


/etc/systemd/system/tcpdump-starter.service
starts the tcpdump-starter script on the timer event
-c 12: start collecting detail information when the
CPU load is > 12-u 32000: start collecting detail information if the
number of queries over UDP > 32.000-t 1000: start collecting detail information if the
number of queries over TCP > 1.000[Unit] Description=TCPDUMP Starter Service [Service] Type=oneshot ExecStart=/usr/local/sbin/tcpdump-starter -c 12 -u 32000 -t 1000 [Install] WantedBy=multi-user.target


#!/bin/sh
set -o nounset
cputhreshold=12
udpthreshold=32000
tcpthreshold=1000
diskthreshold=10000000
interface=eth0
while getopts ':t:u:c:' optname
do
case $optname in
c) cputhreshold=$OPTARG;;
t) tcpthreshold=$OPTARG;;
u) udpthreshold=$OPTARG;;
:)
echo "$(basename $0): Argument missing for option ${OPTARG}" >&2
exit 2;;
\?)
echo "$(basename $0): Unknown option ${OPTARG}" >&2
exit 2;;
*)
echo "$(basename $0): Internal error, unexpected option ${optname}" >&2
exit 3;;
esac
done
shift $((OPTIND-1))
[...]


[...]
# test for high CPU load
avg=$(cat /proc/loadavg | cut -d '.' -f 1)
if [ $cputhreshold -le $avg ]; then
echo "High CPU Load ${cputhreshold}"
logger -t tcpdump "High CPU Load ${avg} - starting TCPDUMP"
diskfree=$(df --output=avail / | tail -1)
if [ $diskthreshold -ge $diskfree ]; then
logger -t tcpdump "Not enough free disk space to start TCPDUMP."
else
tcpdump-run 5 $interface
fi
fi
[...]


[...]
# test for high DNS UDP request rate
udpqr=$(/usr/sbin/rndc status | grep "recursive clients" | cut -d ':' -f 2 | cut -d '/' -f 1)
if [ $udpthreshold -le $udpqr ]; then
echo "High UDP DNS Load ${udpqr}"
logger -t tcpdump "High UDP DNS Load ${udpqr} - starting TCPDUMP"
if [ $diskthreshold -ge $diskfree ]; then
logger -t tcpdump "Not enough free disk space to start TCPDUMP."
else
tcpdump-run 5 $interface
fi
fi
[...]


[...]
# test for high DNS TCP request rate
tcpqr=$(/opt/local/bind/sbin/rndc status | grep "tcp clients" | cut -d ':' -f 2 | cut -d '/' -f 1)
if [ $tcpthreshold -le $tcpqr ]; then
echo "High TCP DNS Load ${tcpqr}"
logger -t tcpdump "High TCP DNS Load ${tcpqr} - starting TCPDUMP"
if [ $diskthreshold -ge $diskfree ]; then
logger -t tcpdump "Not enough free disk space to start TCPDUMP."
else
tcpdump-run 5 $interface
fi
fi


tcpdump-run collects network traffic for some time (5
minutes in this case)#!/bin/sh tcpdump -q -W $1 -G 60 -w /pcap/packettrace-%F-%T.pcap -i $2








atop home page https://www.atoptool.nl/



