Skip to content
November 10, 2017 / ftth

Software raid check killing Ubuntu 16.04 servers

While investigating why our automated on-site Q&A tests would fail once per month (http timeouts, etc…), indicating heavy load, i eventually discovered that it happened while the monthly raid array check was running, and found out that Ubuntu 16.04 ships

  • with the deadline i/o scheduler by default (cat /sys/block/sda/queue/scheduler)
  • a monthly software raid check (launched by /etc/cron.d/mdadm) which runs /usr/share/mdadm/checkarray with the –idle argument (which uses ionice)

From /usr/share/mdadm/checkarray:

# queue request for the array. The kernel will make sure that these requests
# are properly queued so as to not kill one of the array.
echo $action > $MDBASE/sync_action
[ $quiet -lt 1 ] && echo "$PROGNAME: I: check queued for array $array." >&2

case "$ionice" in
 idle) ioarg='-c3'; renice=15;;
 low) ioarg='-c2 -n7'; renice=5;;
 high) ioarg='-c2 -n0'; renice=0;;
 realtime) ioarg='-c1 -n4'; renice=-5;;
 *) break;;
esac

resync_pid= wait=5
while [ $wait -gt 0 ]; do
 wait=$((wait - 1))
 resync_pid=$(ps -ef | awk -v dev=$array 'BEGIN { pattern = "^\\[" dev "_resync]$" } $8 ~ pattern { print $2 }')
 if [ -n "$resync_pid" ]; then
 [ $quiet -lt 1 ] && echo "$PROGNAME: I: selecting $ionice I/O scheduling class and $renice niceness for resync of $array." >&2
 ionice -p "$resync_pid" $ioarg || :
 renice -n $renice -p "$resync_pid" 1>/dev/null || :
 break
 fi
 sleep 1
done

However, since the deadline i/o scheduler does ignore ionice, even though the –idle argument is passed, the raid check (which is very long) will just not run with a low i/o priority…

The most incredible part is how undocumented this all is…

Leave a comment

Design a site like this with WordPress.com
Get started