NAT64 with less NAT44
I wanted to explore NAT64,
primarily by not relying on any NAT44 but instead using NAT66, with a pile of reasons:
- Not having to reserve a block of private IPv4 just for NAT64
- Not having to track state in the NAT64 adapter, so more reliable.
- Maybe easier to secure because of less state involved.
The NAT44 based implements work like thus:
- 2001:db8:d0be:f00d:aede:48ff:fe23:4567 wants to talk to 64:ff9b::203.0.113.1 on the internet.
- NAT64 rewrites as from some private pool IPv4 such as 192.168.208.190 and keeps the "to" 203.0.113.1
- NAT44 translates as from 198.51.100.1 and to 192.0.2.1
The state tracking happens when NAT64 has to deal with addresses outside of 64:ff9b::
Instead, with a NAT66 based implentation, the story goes like this.
- 2001:db8:d0be:f00d:48ff:fe23:4567 wants to talk to 64:ff9b::203.0.113.1 on the internet.
- NAT66 translates as from 64:ff9b::198.51.100.1 and to 64:ff9b::203.0.113.1
- NAT64 rewrites, statelessly, as from 198.51.100.1 and to 203.0.113.1
Whilst the NAT66 is itself not stateless, this is core kernel code and is visible in conntrack
It could then even be possible to migrate filtering in a host to IPv6 and deal with transit as NAT46, then NAT66 and finally NAT64.
Static NAT66 mappings can be more pleasant, like for multicast nat.
- 2001:db8:d0be:f00d:aede:48ff:fe23:4567 wants to emit multicast to ff3e:40:2001:db8:d0be:f00d:d09e:f00d
- NAT66 translates as from 64:ff9b::198.51.100.1 and to 64:ff9b::234.203.0.113
- NAT64 rewrites as from 198.51.100.1 and to 234.203.0.113
If I was going to rewrite NAT64, also gettitg this inside linux without a taint for performance, rather than outside as before.
Adding NAT64 to linux without a taint needed to use BPF, so I found out information how to do that, and unfortunatly with no complete working module, so had a go at reconstructing a module.
Have got most things working except the checksums, fortunatly the kernel can do that already although a small performance improvement could be had by putting that in the nat64. There is another NAT44 based implementation.
And there is something on this topic in android
NAT64 Research steps
BPF is composed in nat64.c
Compile it, as it says in tc-bpf
- clang -O2 -emit-llvm -c nat64.c -o - | llc -march=bpf -filetype=obj -o nat64.o
Setup a test environment:
- ip link add za type veth peer name zb
- ip link set dev zb addr 02:00:00:00:00:0b
- ip neigh add 192.168.2.1 lladdr 02:00:00:00:00:0a dev zb
- ip neigh add 192.168.2.2 lladdr 02:00:00:00:00:0b dev za
- ip neigh add 64:ff9b::c0a8:202 lladdr 02:00:00:00:00:0b dev za
- ip neigh add 64:ff9b::c0a8:201 lladdr 02:00:00:00:00:0a dev zb
- tc qdisc replace dev za root prio
- tc qdisc replace dev zb root prio
- tc filter del dev za
- tc filter del dev za ingress
Early in experimention I had used mirred but gather nat64 can work without that:
- #tc filter add dev za matchall action mirred egress mirror dev zb action bpf obj /usr/src/bpf/nat64.o
(re)load the BPF code, onto za, and push the supplimentary csum module to correct the various checksums.
- tc qdisc replace dev za ingress handle ffff:
- tc qdisc replace dev zb ingress handle ffff:
- tc filter add dev za parent ffff: matchall action bpf obj /usr/src/bpf/nat64.o sec 4to6 csum ip ip4h tcp udp icmp || exit
- tc filter add dev za matchall action bpf obj /usr/src/bpf/nat64.o sec 6to4 csum ip tcp udp icmp || exit
Linux actually does more fussy validation and may refuse to load some modules that passed "compilation" as a test, so repeatedly reloading a module after compiling it can be useful.
I did not think this would work well with ARP involved, so initial tests rely on static mac addresses.
- ip link set dev za up
- ip link set dev zb up
- ip link set dev za arp off
- ip link set dev zb arp off
- ip address add dev za 64:ff9b::c0a8:201/96
- ip address add dev zb 192.168.2.2/24
Testing steps, also involving user's choice of tcpdump or wireshark, and sending packets.
BPF trace output is used to view the output of the various bpf_printk statements
- tc -s filter show dev za
- tc -s filter show dev za ingress
- echo tracing
- cat /sys/kernel/debug/tracing/trace_pipe