How-To: Continuously Monitor System Load And Track Down Resource Hogs

I’ve been using Linux for 12+ years. In all that time I’d never used, or even heard of, a program called tload. tload is similar to the top program, in that is sits running in your terminal, showing you the current system load.

tload gives you a basic text “graph” of the current system load average. It uses data from /proc/loadavg to operate. If you’re really into how tload operates, check out this post by Baron Schwartz.

I’ll usually open up a connection to a server via SSH and just let tload run for a while. If the load gets to be too high, I’ll usually use top to see which processes are sucking up all my resources, processor resources typically. I usually just run top like so:
top -H

After that, there’s a couple commands that I use to find how many threads are running under a specific user or under a specific service. To find the users (5 in this case) with the largest number of threads running under them, run the following:
ps aux | awk '{print $1}' | sort | uniq -c | sort -nk1 | tail -n5

To get the processes with the most threads running under them, run the code below. It’ll also show the top 5 processes. Change the tail -n5 part to tail -n10 to see the top 10 instead.
ps aux | awk '{print $11}' | sort | uniq -c | sort -nk1 | tail -n5

That’s really all I use when I encounter extremely high loads, whatever the reason may be. They should work nicely for you too. tload comes bundled with pretty much every linux distribution in existence. And from the info you gather with top and the awk commands above, you can easily narrow down what the issue is. Figuring out why that issue is happening in the first place is an entirely different story though.

There’s one other really good method involving top that is discussed at The code example they give is top -b -i -n 20 >> ./top_procs. It basically saves information from top into a file, top_procs, and runs the top command 20 times. They describe it better than I do.

What that does is tell TOP to run in “batch” mode (not look for any user input), show only running processes, loop 20 times, and append the output to the file /top_procs. Run that command when you are experiencing a high server load. Then you can view the contents of that file to tell you some information.

So that’s that. Should be relatively easy for you to track down what’s causing high load on your vps, server, desktop, or whatever other linux system you use.