2009-08-13

Cratered your Linux Box? Here are some helpful tips.



A friend of mine encountered a problem where all the storage vanished from the system while he still had a root shell. He needed to reboot, but could not do it any traditional way, because "reboot" "init" and most other commands were all on the no-longer-mounted filesystem.

I have a bunch of little shell gems written down in my notebook. Some of them are "things you can do with only shell built-in commands". Built-in commands also come in handy when your system is completely cratered, for example, by a fork bomb. Ever see this? It's usually the harbinger of bad news.

axon@silence:~$ ps
-bash: fork: Resource temporarily unavailable

Some of the following may work in other operating systems with modification.

The basics

ls:
One of the first things you'll probably want to be able to do is list files. Since "cd" is a built-in as well, you can navigate. You can use the old standby "echo *" - it will barf out a list of files in the current directory. You can even alias it.
alias ls="echo *"

cat:
You'll need to create a shell function, but if you use built-ins wisely, you can create a "cat" clone that will dump files out by reading and echoing one line at a time until the end-of-file. I have this written down, but I apparently found it on some web discussion board years ago. You can name the function "cat" so that it doesn't even try to use /bin/cat anymore.
cat() { while IFS="" read l ; do echo "$l" ; done < $1 ; }

Use /proc!
Once we have a "cat" and "ls" replacement, we can poke around in /proc to our heart's content to get information about what is wrong. Don't ignore informative treasures such as /proc/meminfo and files in /proc/sys for example. Everything you find in /proc has the potential to help you investigate what went south. Here are some of my favorite tricks for using /proc:

uptime:
cat /proc/uptime (shows uptime in seconds)
cat /proc/loadavg (shows load averages you'd expect with the "uptime" shell command)

ps:
This is REALLY UGLY and just plain ghetto. But it will tell you what pid belongs to what. Note, you'll have to have the "cat" function above defined. If you want to decode all the info, look at the /proc/[number]/stat section of the proc(5) man page.
ps() { for line in /proc/*/stat; do cat $line; done }

kill:
This is a builtin of most shells including bash... No special tricks needed. You can TRY to kill things now that you have a ghetto "ps" replacement. If you have a fork-bomb, you'll have to fight pretty hard to get anywhere.
kill -[signal] [pid]

Last ditch efforts
reboot:
One can force a reboot using the "echo" builtin and sysrq kernel calls, but SysRq has to be compiled into the kernel, it is with Ubuntu. This was the eventual solution to my friend's almost-unusable system at the beginning of this post, since all other hope was lost.

The following will enable SysRq, then send the "boot" command (equivalent to pressing alt-prtscr-b on the console)
echo 1 > /proc/sys/kernel/sysrq; echo b > /proc/sysrq-trigger

exec:
Exec [command] will almost always work, but you only get one shot at it, since it replaces the shell with whatever you call. If you waste it on something that doesn't recover the system, you'll lose your shell at the same time. Best to use this on "reboot" or similar.
exec /sbin/reboot

Have any other must-know commands to weasel your way out of a nearly-deadlocked system? Let us know in the comments! I am always looking for nifty shell recipes like this to jot down for a rainy day.

blog comments powered by Disqus