Linux Kernel 5.12 looks very promising: I/O performance boost and idmap mounts!
Linux Kernel 5.12 started with a bumpy road, as chief maintainer Linus Torvalds had to battle against power outages in his living area. But development has continued and Kernel development is back on the road.
Kernel 5.12 is currently available as Release Candidate 1. And gone are the romantic days of "Valentine's Day"; Torvalds nicknamed the 5.12 Kernel "Frozen Wasteland":
Besides the usual improvements (such as adding or improving drivers for hardware), there is one big change which should be looked at a bit closer: Move napi polls from softirq to kthread.
Under the current implementation of napi polls, softirq is used to query information about the current CPU load. However by doing this, the scheduler only has poor visibility into cpu cycles spent inside the softirq context and cannot provide optimal scheduling decisions (= not optimal performance).
With napi poll moved to kthread, scheduler is in charge of scheduling both the kthreads handling network load, and the user threads, and is able to make better decisions.Wei Wang
On a benchmark test using netperf tcp_rr, the different implementations (softirq, kthread and workq) were compared against each other. With an astonishing performance boost (much faster response time) when napi runs as kthread:
req/resp QPS 50%tile 90%tile 99%tile 99.9%tile softirq 1B/1B 2.75M 337us 376us 1.04ms 3.69ms kthread 1B/1B 2.67M 371us 408us 455us 550us workq 1B/1B 2.56M 384us 435us 673us 822us softirq 5KB/5KB 1.46M 678us 750us 969us 2.78ms kthread 5KB/5KB 1.44M 695us 789us 891us 1.06ms workq 5KB/5KB 1.34M 720us 905us 1.06ms 1.57ms softirq 1MB/1MB 11.0K 79ms 166ms 306ms 630ms kthread 1MB/1MB 11.0K 75ms 177ms 303ms 596ms workq 1MB/1MB 11.0K 79ms 180ms 303ms 587ms
Only on higher request/response sizes (1MB) the difference was not that significant anymore.
5.12 introduces idmap mounts
For everyone using containers, this is big news. Starting with Kernel 5.12 it will be possible to mount idmapped file systems and share files between different users.
It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2).
It is possible to share files between containers with non-overlapping idmappings.
They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files.
Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists.Christian Brauner
This is in particularity useful for unprivileged containers, which are common around the LXC and LXD projects. These containers typically run with a "fake" ownership, mapped to an unprivileged local Unix user.
But also application containers (Docker/Kubernetes) are able to profit from this feature. The application container runtime (containerd) added experimental support for idmapped mounts.
This could be the basis for running containers without needing any privileged rights on a host, which in general is a great security improvement!