ArticlesHardwareLinux

When you hit the end of the kill road: The uninterruptable sleep state

We all know this from Windows: A process is using all resources and is freezing up the machine. While trying to sensitively move the mouse to the bottom left and carefully following the laggy cursor hopping towards the wanted "Start" button, one is thinking "oh no, not again".

Trying to shut down the computer – but it won't let you.

When the cursor finally reaches its target (after several failed attempts, because all of a sudden the cursor would be way off again) a gentle and well-targeted click on the Start button, hopefully then followed by "Shut down" a bit above, would release the machine from its zombie-like state. But nope! The cursor jumped way up just in this moment! Frustration meets the user, and the user hits the reset button on the workstation (or keeps pushing the power button for 10 seconds).

While this is a situation (almost) every Windows user knows – especially if you've been using Windows 95 or 98 – this problem (almost) never occurs to Linux users. Why? Because Linux users have the license to kill.

A process using up all CPU resources? kill!
Another process eating up all memory? Ha! kill!
And even if the process doesn't want to die, just kill -9 (SIGKILL) it and problem solved!

But what some Linux users may not know is that this kill road has one dead end and it has a name: The uninterruptable sleep state.

The uninterruptable sleep state

Whenever a process is about to be killed (using the kill command), it is not killed right away. The kill command is received by the Kernel, scheduling the kill task on the relevant process(es). This scheduling happens within microseconds, but it still happens. The execution of this scheduled kill task only happens, when the process is ready to receive a command. This is the case in any process state with one exception: The uninterruptable sleep state.

With ps such a process can be identified when a capital "D" is present in the process state. A practical example:

ck@mintp ~ $ ps -eo ppid,pid,user,stat,pcpu,comm,wchan:32
 PPID   PID USER     STAT %CPU COMMAND         WCHAN
    0     1 root     Ss    0.0 systemd         -
[...]
 4573 30085 lp       S     0.0 dbus            -
 4573 30086 lp       S     0.0 dbus            -
    1 30202 ck       Sl    0.0 scp-dbus-servic poll_schedule_timeout
 2048 31417 ck       Sl    0.2 vmplayer        poll_schedule_timeout
    2 31744 root     D     0.0 jbd2/sdc1-8     -
    2 31745 root     I<    0.0 ext4-rsv-conver -
    1 32730 ck       Dsl  56.4 vmware-vmx 

Two processes (PIDs 31744 and 32730) can be identified with this capital D. As PID 32730 depends on 31744, the process is continuing to using a high amount of CPU yet will never finish the current task as itself and the process it depends on are both in an uninterruptable sleep state. The reason for this was a dying SSD drive (SDC as you might have guessed from the output). For more details read this article: How a dying SSD of a Windows virtual machine killed the physical Linux host.

Couple of other explanations of the uninterruptable sleep state:

An uninterruptible sleep state is a sleep state that won't handle a signal right away. It will wake only as a result of a waited-upon resource becoming available or after a time-out occurs during that wait (if specified when put to sleep). It is mostly used by device drivers waiting for disk or network IO (input/output). When the process is sleeping uninterruptibly, signals accumulated during the sleep will be noticed when the process returns from the system call or trap.

Wikipedia (Sleep_(system_call))

However, when a process is frozen (e.g., because it got to the FROZEN state via the freezer cgroup, or a hibernation triggered by a device), the task switches to the UNINTERRUPTIBLE state, where it’ll not get the opportunity to be ever scheduled to run until it gets switched to another state, meaning that it’ll not be killable until so.

Ciro S. Costa on https://ops.tips/notes/2019-10-01/

More details about such processes

To find out more information about the processeses blocked in an uninterruptable sleep state, the "Magic System Request" can be used. You (most likely) even have a key on the keyboard to trigger such a System Request:

Magic System Request key on keyboard
Magic System Request key on keyboard

By using a system request trigger in combination with the "w" command, processes in an uninterruptable sleep state will be dumped:

w: Dumps tasks that are in uninterruptable (blocked) state.

The dumped information will then be logged to the kernel facility, where it (should) be caught by a syslog and logged to – depending on the Linux distribution and syslog configuration – either /var/log/syslog, /var/log/messages or /var/log/kern.log.

To trigger this, use the following command (as root):

mintp ~ # echo w > /proc/sysrq-trigger

Where to go from here?

Now to the bad news: If you've read (and understood) the above description of an uninterruptable sleep state, you now know that these processes cannot be killed! The only way to get rid of such hanging processes is to reboot or even reset the machine.

A very unsatisfying feeling for a Linux user, I know. But that's the only way to get rid of the blocked processes. As Wikipedia mentions it:

the only non-sophisticated way to get rid of them is to reboot the system.

And this is where you've hit the dead end.

Claudio Kuenzler
Claudio has been writing way over 1000 articles on his own blog since 2008 already. He is fascinated by technology, especially Open Source Software. As a Senior Systems Engineer he has seen and solved a lot of problems - and writes about them.

Leave a reply

Your email address will not be published.

More in:Articles