In a previous post in the not so distant past I showed how to leverage some Linux CLI tools to make the VNX control station work a little harder and let you work a little easier. Mainly this was around scripting multiple file replications at the same time. I noted in the post that I like to leave the SSH window up and just let them run so I can see them complete. And I also noted that if you didn’t want to do this, the -background flag should queue the tasks in the NAS scheduler and let you go about your merry business.
I have now used the -background flag and wanted to mention two important things about it.
- Less important than the next point, but still worth mentioning, using the -background flag still takes a while to run the commands. I was expecting them to complete one after another in short order…not so much. Not near as bad as actually waiting for the normal replication tasks to finish, but still not optimal.
- Most importantly, after queueing up 33 file replication tasks in the NAS scheduler, I came back to find out that only three of them had succeeded. All the rest had failed.
The commands were valid and I’m not sure exactly what caused them to fail. Maybe there is a good reason for it. I have a gut feeling that the NAS scheduler has a timeout for tasks and these nas_replicate commands exceed it (because some of them take a really, really long time to finish). Unfortunately I didn’t have the time to investigate so I went back to the CLI without the -background flag. This worked just fine, but it takes a very long time to schedule these puppies because the tasks take so long to run. In a time crunch, or “it’s 5:15 and I want to launch this, logout, and hit the road” situation, it might not be ideal.
So, what if you want a reliable method of kicking off a bunch of replication tasks (or a bunch of tasks on any *nix box…again I’m mainly trying to demonstrate the value of Linux CLI tools) via SSH and then letting them run through? Once again let’s do some thinking and testing. Note: I am on a lab box and am not responsible for anything done to your system. All of these commands can be run on any Linux box to test/play with.
I need something that will take “a while” to run and produce some output. In order to do this I create a simple test script with the following line:
for i in {1..10}; do echo $i; sleep 5; done
In bash, this will loop 10 times, and each time it loops it will print the value of $i (which will go 1, 2, 3…all the way up to 10) and then sleep, or wait, for 5 seconds. The whole script will take about 50 seconds to run, which is perfect for me. Long enough for testing but not so long that I’ll be falling asleep at the keyboard waiting for the test script to complete.
Now, because my plan is to run this guy and then log out, I want to track the output to something other than my screen. I’ll redirect the output to a file using the > operator:
for i in {1..10}; do echo $i > outfile; sleep 5; done
Now if I run bash testscript.sh the screen basically just sits there for 50 seconds, and then it returns the prompt to me. If I check the contents of outfile.txt then I see my output, which should be 1 2 3 4….etc. right?
$ bash testscript.sh
$ cat outfile
10
$
Or not! I only have 10 in the outfile. I only have 10 in the outfile because my script is doing exactly what I told it to do! You’ll find this is a common “problem” with your scripts. 🙂 My echo $i > outfile overwrites outfile every time that line runs. Instead I want to create a new outfile every time the script runs, and then append (>> operator) the numeric output while my script is running. No problem:
$ cat testscript.sh
echo ” ” > outfile
for i in {1..10}; do echo $i >> outfile; sleep 5; done
Testing is really, really important because while you can always rely on the computer to do exactly what you tell it to do, you cannot always rely on yourself to tell it what you are expecting it to do. Now when I run my script I get this in outfile:
$ cat outfile
1
2
3
4
5
6
7
8
9
10
OK this is what I had envisioned. Another problem – when I run this script it locks my terminal so I am actually unable to exit the SSH session (which is what I’m hoping to accomplish in the end!). In order to make this work I’m going to need to run the script as a background task using the & operator.
$ bash testscript.sh &
[1] 14007
$ ps 14007
PID TTY STAT TIME COMMAND
14007 pts/21 S 0:00 bash testscript.sh
When I do this, it immediately returns my prompt and gives me the process ID (or PID) that I can use with ps to answer the question “is it still running?” Of course I can also just cat outfile and see where that is too.
Now I’ve got everything I need. I bash testscript.sh & and then exit and wait 50 seconds. Then reconnect via SSH. What do I see in outfile? Why, I see my entire expected output, 1 through 10! Awesome…except honestly I was expecting to not see it here. If you read my About Me page, I state I really enjoy the learning experience (and readily admit I don’t know everything!). This is a good example of learning something while trying to teach others.
You see, if you have done something like this before you may have seen where running the background task with & and then exiting doesn’t work. As soon as you exit, the process is killed and your background task stops. There is a workaround for this (called nohup which I’m about to go into) but this left me scratching my head as to why this was actually working without the workaround? I thought this was default behavior. To the Googlenator!
http://stackoverflow.com/questions/15595374/whats-the-difference-between-nohup-and-ampersand
In this very helpful post, user nemo articulates why I’m not seeing what I normally see:
In case you’re using bash, you can use the command shopt | grep hupon
to find out whether your shell sends SIGHUP to its child processes or not. If it is off, processes won’t be terminated, as it seems to be the case for you.
Heading back to the CS, I run this command to find out it is indeed off:
$ shopt | grep hupon
huponexit off
Sweet. This means that on a CS you may not even need the workaround. However, you may not be working on a VNX Control Station at this same version, and perhaps this value is different among them. Heck you may not be working on a VNX Control Station at all. So if it were on, what would change? Well lets turn it on and see. Again this is a lab environment!
$ shopt -s huponexit
$ shopt | grep huponexit
huponexit on
Now once again I run the script as a background task, then exit, wait 50 seconds and then reconnect.
$ cat outfile
1
2
$
OK this is what I was expecting to see. Even though I have run the command with an ampersand, as soon as I dropped my SSH connection it was killed. In order to work around this, we need to use the nohup command along with it.
$ nohup bash testscript.sh &
Once again I exit, wait, and reconnect. Now when I cat outfile I see all the numbers because my script continued running in the background despite the huponexit setting.
Finally, briefly, I wanted to mention that it is possible to run background tasks via ssh in one line when connecting from another host, but you will notice that they don’t actually return your shell to you. E.g.:
$ ssh user@host “nohup bash testscript.sh &”
You won’t get a prompt returned here until the remote task finishes because SSH won’t drop the connection when I/O streams are open. Instead try redirecting the I/O streams per the suggestion here:
http://superuser.com/questions/449193/nohup-over-ssh-wont-return:
$ ssh -n user@host “nohup bash testscript.sh & > /dev/null &2>1 &”
This will immediately return your prompt.
SO…..
So, what have we learned?
- It is possible to run scripts on a linux box that will continue to run after SSH is dropped using the & operator to background them
- If the huponexit flag is set, you will need the nohup command to keep the script running after exit
- If you are running a one-liner via SSH you will need to redirect your input streams in order to effectively return your prompt after the command kicks off
- On a VNX Control Station the -background flag apparently is not so great at actually completing your requested commands
Now all I need to do is use this knowledge with the script generation from the first post (obviously you would need to update with your pool IDs, VDM name, etc.) with a minor modification shown here in underlined italics:
nas_fs -info -all | grep name | grep -v root | awk ‘{print $3;}’ > /home/nasadmin/fsout.txt
for fsname in `cat /home/nasadmin/fsout.txt`; do echo nas_replicate –create $fsname –source –fs $fsname –destination –pool id=40 –vdm MYDESTVDM01 –interconnect id=20001 >> replicatescript.sh; done
Then if you cat replicatescript.sh you should see all of your file systems and the replication commands you generated for them. And then finally you should be able to bash replicatescript.sh & then log out and it should process through.
Takeaway
This post was again not really about “how to run file system replication tasks on a VNX Control Station via a script that you can leave running while you get your coffee,” though if this nails that for you I’m super happy. This post was about demonstrating the crazy abilities that are available to you when you leverage the Linux CLI. It is, no exaggeration, very hard to overstate how powerful this is for system, network, and storage administrators, especially considering a lot of hardware and appliance CLIs are at least Linux-like.
I would also suggest tinkering with keyed SSH as well. I may cover this in a future post, but briefly this will allow you to establish a trust of sorts between some users on systems that allows you to remotely connect with encryption, but without requiring you to enter a password. Several things support SSH (and keyed SSH) but don’t have the full bash CLI suite behind them – off the top of my head I know this is true for NetApp filers and Cisco switches. Keyed SSH will allow you to run commands or scripts from a trusted Linux host, and use one-liner SSH calls to execute commands on remote hardware without having to enter passwords (or keep passwords in plaintext scripts like may happen with expect scripts). If you can learn and leverage scripting, this is the gateway into Poor Man’s Automation.