NAT64 with less NAT44

I wanted to explore NAT64, primarily by not relying on any NAT44 but instead using NAT66, with a pile of reasons:

The NAT44 based implements work like thus:

Private or special addresses were not supposed to use 64:ff9b::, instead these can go in 64:ff9b:1:fffe::/96

  1. 2001:db8:d0be:f00d:aede:48ff:fe23:4567 wants to talk to 64:ff9b:1:fffe::203.0.113.1 on the internet.
  2. NAT64 rewrites as from some private pool IPv4 such as 192.168.208.190 and keeps the "to" 203.0.113.1
  3. NAT44 translates as from 198.51.100.1 and to 192.0.2.1

The state tracking happens when NAT64 has to deal with addresses outside of 64:ff9b:1:fffe::

Instead, with a NAT66 based implentation, the story goes like this.

  1. 2001:db8:d0be:f00d:48ff:fe23:4567 wants to talk to 64:ff9b:1:fffe::203.0.113.1 on the internet.
  2. NAT66 translates as from 64:ff9b:1:fffe::198.51.100.1 and to 64:ff9b:1:fffe::203.0.113.1
  3. NAT64 rewrites, statelessly, as from 198.51.100.1 and to 203.0.113.1

Whilst the NAT66 is itself not stateless, this is core kernel code and is visible in conntrack

It could then even be possible to migrate filtering in a host to IPv6 and deal with transit as NAT46, then NAT66 and finally NAT64.

Static NAT66 mappings can be more pleasant, like for multicast nat.

  1. 2001:db8:d0be:f00d:aede:48ff:fe23:4567 wants to emit multicast to ff3e:40:2001:db8:d0be:f00d:d09e:f00d
  2. NAT66 translates as from 64:ff9b:1:fffe::198.51.100.1 and to 64:ff9b:1:fffe::234.203.0.113
  3. NAT64 rewrites as from 198.51.100.1 and to 234.203.0.113

If I was going to rewrite NAT64, also gettitg this inside linux without a taint for performance, rather than outside as before.

Adding NAT64 to linux without a taint needed to use BPF, so I found out information how to do that, and unfortunatly with no complete working module, so had a go at reconstructing a module.

Have got most things working except the checksums, fortunatly the kernel can do that already although a small performance improvement could be had by putting that in the nat64. There is another NAT44 based implementation, this appears much more complete so using the kernel code for guidance.

And there is something on this topic in android

NAT64 Research steps

BPF is composed in nat64.c

Compile it, as it says in tc-bpf

  1. clang -O2 -emit-llvm -c nat64.c -o - | llc -march=bpf -filetype=obj -o nat64.o

Setup a test environment:

bpf seems to not like unshare -Umrun testing has to be done from actual root, though hopefully the steps are harmless, users that do not trust this may use a virtual machine instead.

Early in experimention I had used mirred but gather nat64 can work without that:

#tc filter add dev za matchall action mirred egress mirror dev zb action bpf obj /usr/src/bpf/nat64.o

#(re)load the BPF code, onto za, and push the supplimentary csum module to correct the various checksums.

#Linux actually does more fussy validation and may refuse to load some modules that passed "compilation" as a test, so repeatedly reloading a module after compiling it can be useful.

# I did not think this would work well with ARP involved, so initial tests rely on static mac addresses.

  1. ip link add za addr 02:00:00:00:00:0a arp off type veth peer name zb addr 02:00:00:00:00:0b arp off;
  2. ip neigh add 192.168.2.1 lladdr 02:00:00:00:00:0a dev zb;
  3. ip neigh add 192.168.2.2 lladdr 02:00:00:00:00:0b dev za;
  4. ip neigh add 64:ff9b:1:fffe::c0a8:202 lladdr 02:00:00:00:00:0b dev za;
  5. ip neigh add 64:ff9b:1:fffe::c0a8:201 lladdr 02:00:00:00:00:0a dev zb;
  6. tc qdisc replace dev za root prio;
  7. tc qdisc replace dev zb root prio;
  8. tc filter del dev za;
  9. tc filter del dev za ingress;
  10. tc qdisc replace dev za ingress handle ffff:;
  11. tc qdisc replace dev zb ingress handle ffff:;
  12. tc filter add dev za parent ffff: matchall action bpf obj /usr/src/bpf/nat64.o sec 4to6 csum ip ip4h tcp udp icmp;
  13. tc filter add dev za matchall action bpf obj /usr/src/bpf/nat64.o sec 6to4 csum ip tcp udp icmp;
  14. ip link set dev za up;
  15. ip link set dev zb up;
  16. ip link set dev za arp off;
  17. ip link set dev zb arp off;
  18. ip address add dev za 64:ff9b:1:fffe::c0a8:201/96;
  19. ip address add dev zb 192.168.2.2/24;
  20. tc -s filter show dev za;
  21. tc -s filter show dev za ingress;
  22. echo tracing
  23. cat /sys/kernel/debug/tracing/trace_pipe