SRE DevOps Interview Questions — Linux Troubleshooting Extended

deadlock

Satyajiijt Roy

Posted on July 24, 2022

SRE DevOps Interview Questions — Linux Troubleshooting Extended

This is an extension of my previous blog about SRE/DevOps Interview Questions — Linux Troubleshooting. Sometimes the initial question is vague and then follow-up kind of clears the path, that is what I am going to focus on in this blog. I will try to get more questions and possible explanations here.

Question: There is a service running on a system and you have been told that the service is not running properly. How would you troubleshoot?

Answer: After verifying the DNS and other things, I would try to SSH to the system and try to see what is going on

Follow Question: You get an error message like the below:

ssh: connect to host example.com port 22: Resource temporarily unavailable
Enter fullscreen mode Exit fullscreen mode

however you did find out that you have IPMI access to the machine, so you can use that to login and you get something like this:

root@example.com#
Enter fullscreen mode Exit fullscreen mode

Follow up Answer: So now I will try to run some basic commands to see what is going on like top , ps etc

Follow Question: You still get a similar error for any command you run:

fork: retry: Resource temporarily unavailable
Enter fullscreen mode Exit fullscreen mode

Note: Now you might already know what is the issue, at least have some idea about it. It is related to Resource being exhausted may be files, processes etc etc and you figured that you can’t run any command (specifically any external command, which needs forking).

Actual Question: How would you troubleshoot a linux box, given none of the external commands executes?

Facts: What should we do ? Use commands which are internal or built-in to the shell .

So how to find what commands are built-in and can be useful to you for troubleshooting ? Also if external commands are not available then how to gather information about the system (top, ps, lsof…even cat is not available )…..🤔

/proc filesystem and couple of built-in like read and for

Type help in the shell and it will give you all the commands which are in-build to the shell
Read about /proc filesystem in my previous blog SRE/DevOps Interview Questions — Linux Troubleshooting and also here

Apparently, all the information coming from commands like ps, lsof, vmstat etc etc can be found in /proc file system if you look at the right file.

For example cmdline file tells you about the last command ran on the system, fd directory contains all the file descriptors and all the folder with numbers are the PIDs running on the system.

So assuming system has too many files open and all resources are exhausted, how to see if which pid is using how many file without using external commands

for fd in /proc/[0-9]*/fd/*; do echo $fd ; done
/proc/1/fd/0
/proc/1/fd/1
/proc/1/fd/2
/proc/1/fd/255
/proc/1/fd/3
Enter fullscreen mode Exit fullscreen mode

This way you can see the file descriptors any pid is using. If you want to see what is last command ran on the system without cat or see the content of the file with cat

read $(</proc/${PID}/cmdline)
Enter fullscreen mode Exit fullscreen mode

So using read, for and other internal commands you can use to find out what is going on in the system.

Trick Question: How do you delete a file named -f or --file ?

Answer: Well explained here. Though the interviewer expects you to walk him/her through the command execution process on the linux system

Question: Explain the output below and what command produces this?

Question: Name the fields and what they represents ?

Question: Looking at the below output, please explain what is going on in the machine ? Please be as detail and descriptive as possible

Question: Compare above 2 vmstat outputs ? and explain the similarities and differences ?

Question: What do you understand from below vmstat output and what is the relationship between in and cs column ?

Trick Question: Looking at vmstat can you tell how many CPU and Cores the system has ?

Question: How to find the config and other related files if only the process is known ?

Answer: Check lsof

Question: Given there is a TCP Connection between 2 machine? How does packets move if they are IP Datagram packets? If Yes, what would be an advantage? Faster? What about latency?

Explanation: Good content to read about TCP and UDP

Question: When you do curl example.com what happens?

Explanation: I don’t have to say what happens when you do curl example.com, What I am going to suggest is that this is basically a combination of multiple questions

  1. How does curl executes on the system? Basically, go over the whole process of command execution from user space to kernel space.
  2. Explain the fork() and exec() calls.
  3. Explain the well known system calls which might be involved
  4. DNS Resolution from example.com to an IP Address
  5. Explain the Request transmission from your terminal to the destination machine and then respond back to your terminal
  6. If possible go into detail about the packet transmission Network layer 2–3 concepts

Follow Up Question: The output of curl example.com is this? Please explain?

>> curl example.com
<HTML>
<HEAD>
<TITLE>Document Has Moved</TITLE>
</HEAD>

<BODY BGCOLOR="white" FGCOLOR="black">
<H1>Document Has Moved</H1>
<HR>

<FONT FACE="Helvetica,Arial"><B>
Description: The document you requested has moved to a new location.  The new location is "https://www.example.com".
</B></FONT>
<HR>
</BODY>
Enter fullscreen mode Exit fullscreen mode

Explanation: Talk about the Response code you have received (not visible in output) and why. What can you do to get 200 response code ?

Explain how HTTPS works and go into detail in a 3-Way handshake. Explain the Asymmetric and Symmetric encryption in terms of SSL. Here is a nice writeup about it

Follow Up Question: How do you tell your system to not used /etc/hosts Does file override for DNS Resolution?

Answer: nsswitch.conf read about it here

Question: What is nscd service and why it is used ?

Answer: Read about nscd here

Follow Up Question: How does the Linux system know that your application or any application will use which protocol (TCP or UDP) to communicate?

Answer: /etc/services nice write-up here on the same.

Question: How to simulate 50% packet drop for Testing purposes?

Explanation: Here

Question: Talk about gRPC and why would one use it ? Pros and cons ?

Explanation: Good Read

Question: What are the ways to count total TCP Connections on the system ?

Explanations: couple of commands can be used like netstat , ss and not to forget the /proc filesystem ( specifically /proc/net/sockstat file). Good examples here

Question: How to troubleshoot Short Lived Process or Processes causing CPU Spikes ?

Explanation: Good Readings here and here

Hope this helps in your journey and provides more food for the thoughts

Happy Troubleshooting and Best of luck!!

💖 💪 🙅 🚩
deadlock
Satyajiijt Roy

Posted on July 24, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related