Prio QoS

When experimenting with multicast senders it can be found they flood the ethernet interface, and affect priority applications such like PPPoE, SSH, NFS etc. I needed to control this effect.

Prio Concepts

You can have labels that appear inside frames and thus are visible to other hosts. These can affect their behaviour, sometimes negatively, if you do not control the other device, especially if it runs proprietary software or belongs to other people like remote Internet services.

Markers Inside frames

These include, the 802.1Q priority marker, IPv4 and IPv6 DiffServ labels. There is also the IPv6 flow marker.

Outside frames

These markers are not normally conveyed directly to other hosts, though they follow frames around the internal network stack and can influence the defaults of the in-frame markers.

The so-called SKB priority could be set for testing on sockets made in GNU bash, then used for throughput tests </dev/urandom >&3 pv

Regarding the classification, the major part targets to a qdisc, the minor is the class within it.

To add to this all, some Ethernet line drivers have multiple ring-buffers, such as 4. This can allow some rings to be edited bf a CPU whilst others are transmitting or receiving. It has the side effect that frame transmission order is indeterministic and might need some experimentation.

Setting up a prio qdisc

If $INT represents base interface, see if there is one, or more rings. and set a prio if the one, and an mqprio if several.

For a prio, set the maximum number of bands, that being 16.

priomap sets a default classification based on the packet contents. Here I am setting it to the centre of the range.

root_qdisc () { if test ${#} -gt 1; then tc qdisc add dev $INT root handle d0be: multiq; else tc qdisc add dev $INT handle d0be root prio bands 16 priomap 8 8 8 8 7 7 7 7 8 8 8 8 7 7 7 7; fi; } root_qdisc /sys/class/net/$INT/queues/tx-*;

Within each class, added a ten thousand packet fifo. The reason for this is that enqueued frames can be sorted inte these fifos, and the more important fifos will be cleared in preference to the lesser ones.

for N in `seq 1 16`; do tc qdisc add dev $INT handle `printf %x $N` parent d0be:`printf %x $N` pfifo limit 10000; done; fi; };

Correspondingly, it was useful to keep the main Ethernet "txqueuelen" to a relatively low value, since once frames enter this they may well not be sorted again prior to being handed to the driver.

On the other side, if the ethernet driver offers ringbuffer sizing it can still be useful to maximise this to keep the driver busy whilst there is queued work.

It has been observed that PCI driver like rtl8169, that use DMA to do frame reception and transmission may keep their ringbuffer in system memory, and thus there is a further chance to prioritise frames. There may be a further on chip fifo before frames reach the cable though.

Classifying Ethernet Frames

To make this useful, frames need classification setting rather than letting them default.

Within this prio, lower classes take absolute priority over the higher bands. This has no fairness, though it is nicely predictable.

It is usually necessary to set in both tc, and in iptables since iptables and ip6tables only handles frames that carry IPv4 and IPv6 directly.

Firstly, I want to prioritise PPPoE. I carry this within VLAN 50.

tc filter add dev $INT parent d0be: protocol all prio 1 basic match "(cmp(u16 at 12 layer 0 eq 0x8863) or cmp(u16 at 12 layer 0 eq 0x8864)) and meta(vlan eq 50)" action skbedit queue_mapping 1 flowid d0be:5

There are some important applications, like DNS and NFS. Let us make them more important than the middle, though PPPoE has the most preference.

Here I am forcing udp to ports 5004 and 1234 to a low classification, so that where a multicast sender floods the interface it does not cut off the more important applications.


tc -d -p -s qdisc show dev enp1s0 shows the running statistics

The enp1s0 is an example of the systemd path based naming. This is liked as these names are used by iptables rules.

The main figures of interest are backlog and dropped, they show how many packets are waiting in the fifo for transmission and if any were dropped as exceeded the fifo capacity.

In some situations the application can blocks instead of dropping, often preferable as it conserves work.