Visible number of cpu on OpenVZ and LXC
Franck Pachot
Posted on August 5, 2021
You may find this blog post because you hit the same issue. Note that for the moment, I have no solution. I hope to remove this 1st paragraph and add a solution one soon ;)
I was trying to install YugabyteDB on Jelastic, which uses Virtuozzo/OpenVZ virtualization, but starting the yb-master failed with:
*** Check failure stack trace: ***
F0805 13:33:58.388178 28 locks.h:201] Check failed: cpu < n_cpus_ (21 vs. 6)
@ 0x7f2760d809a1 yb::(anonymous namespace)::DumpStackTraceAndExit()
@ 0x7f276016200d google::LogMessage::Fail()
@ 0x7f2760164536 google::LogMessage::SendToLog()
@ 0x7f2760161a6a google::LogMessage::Flush()
@ 0x7f2760165159 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f27695992c8 yb::percpu_rwlock::get_lock()
@ 0x7f276958ebcd yb::log::Log::Reserve()
@ 0x7f27695905bb yb::log::Log::AsyncAppendReplicates()
@ 0x7f2769849e67 yb::consensus::LogCache::AppendOperations()
@ 0x7f276982b0d2 yb::consensus::PeerMessageQueue::AppendOperations()
@ 0x7f276985b8e5 yb::consensus::RaftConsensus::AppendNewRoundsToQueueUnlocked()
@ 0x7f276985a2b3 yb::consensus::RaftConsensus::AppendNewRoundToQueueUnlocked()
@ 0x7f276985a7df yb::consensus::RaftConsensus::BecomeLeaderUnlocked()
@ 0x7f276986fcc4 yb::consensus::RaftConsensus::DoElectionCallback()
@ 0x7f2760e1ec54 yb::ThreadPool::DispatchThread()
@ 0x7f2760e1b40f yb::Thread::SuperviseThread()
@ 0x7f275c6de694 start_thread
@ 0x7f275be1b41d __clone
In short this means that, in this percpu_rwlock() assert, the CPU number is higher than the number of CPUs available... how is this possible?
Apparently, on OpenVZ, the number of CPU (n_cpus_
here) reported is number of virtual CPUs made visible by K8s but the logical CPU number (cpu
here) comes from the hypervisor and can cover the whole VM processors.
In order to check that, I've written this little C program to get both values:
- logical CPU number from sched_getcpu()
- visible number of CPU (the same you can see with
lscpu
) from _SC_NPROCESSORS_CONF
cat > sched_getcpu.c <<CAT
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
int main(void) {
printf("sched_getcpu = %3d _SC_NPROCESSORS_CONF = %3d\n", sched_getcpu(),sysconf(_SC_NPROCESSORS_CONF));
return 0;
}
CAT
type gcc || yum install -y gcc
gcc sched_getcpu.c && for i in {1..10} ; do ./a.out ; done
I'm running this on a container in Jelastic, with OpenVZ virtualization:
[root@yb-tserver-0 ~]# yum install -y virt-what
Package virt-what-1.18-4.el7.x86_64 already installed and latest version
Nothing to do
[root@yb-tserver-0 ~]# virt-what
openvz
lxc
And the result is:
[root@yb-tserver-0 ~]# gcc sched_getcpu.c && for i in {1..10} ; do ./a.out ; done
sched_getcpu = 31 _SC_NPROCESSORS_CONF = 6
sched_getcpu = 15 _SC_NPROCESSORS_CONF = 6
sched_getcpu = 17 _SC_NPROCESSORS_CONF = 6
sched_getcpu = 17 _SC_NPROCESSORS_CONF = 6
sched_getcpu = 19 _SC_NPROCESSORS_CONF = 6
sched_getcpu = 21 _SC_NPROCESSORS_CONF = 6
sched_getcpu = 19 _SC_NPROCESSORS_CONF = 6
sched_getcpu = 19 _SC_NPROCESSORS_CONF = 6
sched_getcpu = 21 _SC_NPROCESSORS_CONF = 6
sched_getcpu = 19 _SC_NPROCESSORS_CONF = 6
[root@yb-tserver-0 ~]#
Bad luck, the processor number I'm running on is always larger than the number of CPUs (which substract the CPUs made offline by K8s). One value comes from the host, the other from the Kubernetes limit. And programs like YugabyteDB (or OpenJDK had the same problem, or puppet) which want to manage processor affinity need to find a workaround.
So, should the container show the online vCPU only? Probably, but at least it need to be consistent with the processor number. Here is an example from Oracle Cloud free VM, running 1/8 OCPU, so we see 2 CPUS from the OS:
[opc@a ~]$ lscpu | grep -E "^Model|^CPU|^Hyper"
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 2
CPU family: 23
Model: 1
Model name: AMD EPYC 7551 32-Core Processor
CPU MHz: 1996.250
Hypervisor vendor: KVM
This is 2 vCPU on a 64 threads processor.
The virtualization is KVM:
[root@yb-tserver-0 cores]# virt-what
lxc
kvm
And my little program shows consistent sched_getcpu() numbers:
[root@yb-tserver-0 cores]# for i in {1..10} ; do ./a.out ; done
sched_getcpu = 6 _SC_NPROCESSORS_CONF = 8
sched_getcpu = 1 _SC_NPROCESSORS_CONF = 8
sched_getcpu = 4 _SC_NPROCESSORS_CONF = 8
sched_getcpu = 3 _SC_NPROCESSORS_CONF = 8
sched_getcpu = 5 _SC_NPROCESSORS_CONF = 8
sched_getcpu = 7 _SC_NPROCESSORS_CONF = 8
sched_getcpu = 1 _SC_NPROCESSORS_CONF = 8
sched_getcpu = 4 _SC_NPROCESSORS_CONF = 8
sched_getcpu = 6 _SC_NPROCESSORS_CONF = 8
sched_getcpu = 2 _SC_NPROCESSORS_CONF = 8
And showing only the number of visible CPUs is a feature (it was considered a bug when LXC displayed all the host ones in /sys)
Let's have a look at where the numbers come from in YugabyteDB master server.
- The
cpu
comes from sched_getcpu() in yb::percpu_rwlock::get_lock() - the
n_cpus_
is from n_cpus_ = base::MaxCPUIndex() + 1; which is sysinfo.cc which parses/sys/devices/system/cpu/present
and in my OpenVZ VM this is:
root@node88695-yb-demo ~ $ cat /sys/devices/system/cpu/present
0-5
This looks like the correct behavior but is not consistent with the numbers coming from sched_getcpu() in some hypervisors. Probably we need to remove this assert (as it is done for OSX)
My dirty hack is adding the following before the command (which is exec /home/yugabyte/bin/yb-master
in my case) to hi-jack sched_getcpu() with LD_PRELOAD and return always the last processor number from _SC_NPROCESSORS_CONF:
yum install -y gcc && echo -e "#define _GNU_SOURCE\n#include <unistd.h>\nint sched_getcpu (void) { return sysconf(_SC_NPROCESSORS_CONF)-1 ; };\n" > sched_getcpu.c && gcc -shared -o /tmp/sched_getcpu.so -fPIC sched_getcpu.c ; export LD_PRELOAD=/tmp/sched_getcpu.so &&
There's also the alternative to give a larger range of CPU by reading from a custom /tmp/devices/system/cpu instead of /sys/devices/system/cpu:
echo 0-64 > /tmp_devices_system_cpu_present ; sed -e 's@/sys/devices/system/cpu/present@/tmp_devices_system_cpu_present@g' -i /home/yugabyte/lib/yb/libgutil.so
I patch an existing yb-master statefulsets in this way to force sched_getcpu to return the highest online one:
for i in yb-master yb-tserver ; do kubectl get statefulsets $i -n yb-demo -o yaml | awk '/^ *exec [/]home[/]yugabyte[/]bin[/]yb-master/{sub(/exec/,patch" exec")}{print}' patch='yum install -y gcc ; echo -e "#define _GN
U_SOURCE\\n#include <unistd.h>\\nint sched_getcpu (void) { return sysconf(_SC_NPROCESSORS_CONF)-1 ; };\\n" > sched_getcpu.c ; gcc -shared -o /tmp/sched_getcpu.so -fPIC sched_getcpu.c ; export LD_PRELOAD=/tmp/sched_getcpu.so ; ' | kubectl apply -f /dev/stdin -n yb-demo ; done
or to harcode a 0-64 range corresponding to the host CPUs.
for i in yb-master yb-tserver ; do kubectl get statefulsets $i -n yb-demo -o yaml | tee $i.b.yaml | awk '/^ *exec [/]home[/]yugabyte[/]bin[/]yb-/{sub(/exec/,patch" exec")}{print}' patch=' echo 0-64 > /tmp_devices_system_cpu_present ; sed -e 's@/sys/devices/system/cpu/present@/tmp_devices_system_cpu_present@g' -i /home/yugabyte/lib/yb/libgutil.so ; ' | tee $i.e.yaml | kubectl apply -f /dev/stdin -n yb-demo ; done
but be aware that this is completely unsupported. Check https://github.com/yugabyte/yugabyte-db/issues/9619 for a solution
Posted on August 5, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.