vSphere Client Console Black Screen

Or, more accurate than the title, a black area on the border of the console window.

I got a new laptop running Windows 10 and when I brought up the console in my home lab, this is what I’d get.

screwyconsole

It actually ended up being worse even than it looks as the mouse is pretty much unusable on the screen and a huge chunk of the screen (bottom and right side) is cut off.  It is virtually impossible to do anything with VMs including installations of operating systems.

I tried several fixes like compatibility mode, update drivers, etc. without success.  Then I stumbled upon this scaling setting which is mentioned on several sites dealing with mouse usage in the VDI space.

Browse to C:\Program Files (x86)\VMware\Infrastructure\Virtual Infrastructure Client\Launcher\ and right click on VpxClient.exe and select Properties.

clientsetting

On this screen, select the Disable Display Scaling box indicated.  Then restart your client.

As a annoying side effect, now some of the text isn’t scaled right so some things in the client look wrong.  But it is still usable and the console is usable.

Weirdly, the vCenter console from the web client has no such issue that I noticed.  But if you are being pestered by this “feature” try this setting and see if it helps.

VNX File + Linux CLI PART DEUX!

In a previous post in the not so distant past I showed how to leverage some Linux CLI tools to make the VNX control station work a little harder and let you work a little easier.  Mainly this was around scripting multiple file replications at the same time.  I noted in the post that I like to leave the SSH window up and just let them run so I can see them complete.  And I also noted that if you didn’t want to do this, the -background flag should queue the tasks in the NAS scheduler and let you go about your merry business.

I have now used the -background flag and wanted to mention two important things about it.

  1. Less important than the next point, but still worth mentioning, using the -background flag still takes a while to run the commands.  I was expecting them to complete one after another in short order…not so much.  Not near as bad as actually waiting for the normal replication tasks to finish, but still not optimal.
  2. Most importantly, after queueing up 33 file replication tasks in the NAS scheduler, I came back to find out that only three of them had succeeded.  All the rest had failed.

The commands were valid and I’m not sure exactly what caused them to fail.  Maybe there is a good reason for it.  I have a gut feeling that the NAS scheduler has a timeout for tasks and these nas_replicate commands exceed it (because some of them take a really, really long time to finish).  Unfortunately I didn’t have the time to investigate so I went back to the CLI without the -background flag.  This worked just fine, but it takes a very long time to schedule these puppies because the tasks take so long to run.  In a time crunch, or “it’s 5:15 and I want to launch this, logout, and hit the road” situation, it might not be ideal.

So, what if you want a reliable method of kicking off a bunch of replication tasks (or a bunch of tasks on any *nix box…again I’m mainly trying to demonstrate the value of Linux CLI tools) via SSH and then letting them run through?  Once again let’s do some thinking and testing.  Note: I am on a lab box and am not responsible for anything done to your system.  All of these commands can be run on any Linux box to test/play with.

I need something that will take “a while” to run and produce some output.  In order to do this I create a simple test script with the following line:

for i in {1..10}; do echo $i; sleep 5; done

In bash, this will loop 10 times, and each time it loops it will print the value of $i (which will go 1, 2, 3…all the way up to 10) and then sleep, or wait, for 5 seconds.  The whole script will take about 50 seconds to run, which is perfect for me.  Long enough for testing but not so long that I’ll be falling asleep at the keyboard waiting for the test script to complete.

Now, because my plan is to run this guy and then log out, I want to track the output to something other than my screen.  I’ll redirect the output to a file using the > operator:

for i in {1..10}; do echo $i > outfile; sleep 5; done

Now if I run bash testscript.sh the screen basically just sits there for 50 seconds, and then it returns the prompt to me.  If I check the contents of outfile.txt then I see my output, which should be 1 2 3 4….etc. right?

$ bash testscript.sh
$ cat outfile
10
$

Or not!  I only have 10 in the outfile.  I only have 10 in the outfile because my script is doing exactly what I told it to do!  You’ll find this is a common “problem” with your scripts. 🙂  My echo $i > outfile overwrites outfile every time that line runs.  Instead I want to create a new outfile every time the script runs, and then append (>> operator) the numeric output while my script is running.  No problem:

$ cat testscript.sh
echo ” ” > outfile
for i in {1..10}; do echo $i >> outfile; sleep 5; done

Testing is really, really important because while you can always rely on the computer to do exactly what you tell it to do, you cannot always rely on yourself to tell it what you are expecting it to do.  Now when I run my script I get this in outfile:

$ cat outfile

1
2
3
4
5
6
7
8
9
10

OK this is what I had envisioned.  Another problem – when I run this script it locks my terminal so I am actually unable to exit the SSH session (which is what I’m hoping to accomplish in the end!).  In order to make this work I’m going to need to run the script as a background task using the & operator.

$ bash testscript.sh &
[1] 14007
$ ps 14007
  PID TTY      STAT   TIME COMMAND
14007 pts/21   S      0:00 bash testscript.sh

When I do this, it immediately returns my prompt and gives me the process ID (or PID) that I can use with ps to answer the question “is it still running?”  Of course I can also just cat outfile and see where that is too.

Now I’ve got everything I need.  I bash testscript.sh & and then exit and wait 50 seconds.  Then reconnect via SSH.  What do I see in outfile?  Why, I see my entire expected output, 1 through 10!  Awesome…except honestly I was expecting to not see it here.  If you read my About Me page, I state I really enjoy the learning experience (and readily admit I don’t know everything!).  This is a good example of learning something while trying to teach others.

You see, if you have done something like this before you may have seen where running the background task with & and then exiting doesn’t work.  As soon as you exit, the process is killed and your background task stops.  There is a workaround for this (called nohup which I’m about to go into) but this left me scratching my head as to why this was actually working without the workaround?  I thought this was default behavior.  To the Googlenator!

http://stackoverflow.com/questions/15595374/whats-the-difference-between-nohup-and-ampersand

In this very helpful post, user nemo articulates why I’m not seeing what I normally see:

In case you’re using bash, you can use the command shopt | grep hupon to find out whether your shell sends SIGHUP to its child processes or not. If it is off, processes won’t be terminated, as it seems to be the case for you.

Heading back to the CS, I run this command to find out it is indeed off:

$ shopt | grep hupon
huponexit       off

Sweet.  This means that on a CS you may not even need the workaround.  However, you may not be working on a VNX Control Station at this same version, and perhaps this value is different among them.  Heck you may not be working on a VNX Control Station at all.  So if it were on, what would change?  Well lets turn it on and see.  Again this is a lab environment!

$ shopt -s huponexit
$ shopt | grep huponexit
huponexit       on

Now once again I run the script as a background task, then exit, wait 50 seconds and then reconnect.

$ cat outfile

1
2
$

OK this is what I was expecting to see.  Even though I have run the command with an ampersand, as soon as I dropped my SSH connection it was killed.  In order to work around this, we need to use the nohup command along with it.

$ nohup bash testscript.sh &

Once again I exit, wait, and reconnect.  Now when I cat outfile I see all the numbers because my script continued running in the background despite the huponexit setting.

Finally, briefly, I wanted to mention that it is possible to run background tasks via ssh in one line when connecting from another host, but you will notice that they don’t actually return your shell to you.  E.g.:

$ ssh user@host “nohup bash testscript.sh &”

You won’t get a prompt returned here until the remote task finishes because SSH won’t drop the connection when I/O streams are open.  Instead try redirecting the I/O streams per the suggestion here:

http://superuser.com/questions/449193/nohup-over-ssh-wont-return:

$ ssh -n user@host “nohup bash testscript.sh & > /dev/null &2>1 &”

This will immediately return your prompt.

SO…..

So, what have we learned?

  1. It is possible to run scripts on a linux box that will continue to run after SSH is dropped using the & operator to background them
  2. If the huponexit flag is set, you will need the nohup command to keep the script running after exit
  3. If you are running a one-liner via SSH you will need to redirect your input streams in order to effectively return your prompt after the command kicks off
  4. On a VNX Control Station the -background flag apparently is not so great at actually completing your requested commands

Now all I need to do is use this knowledge with the script generation from the first post (obviously you would need to update with your pool IDs, VDM name, etc.) with a minor modification shown here in underlined italics:

nas_fs -info -all | grep name | grep -v root | awk ‘{print $3;}’ > /home/nasadmin/fsout.txt
for fsname in `cat /home/nasadmin/fsout.txt`; do echo nas_replicate –create $fsname –source –fs $fsname –destination –pool id=40 –vdm MYDESTVDM01 –interconnect id=20001 >> replicatescript.sh; done

Then if you cat replicatescript.sh you should see all of your file systems and the replication commands you generated for them.  And then finally you should be able to bash replicatescript.sh & then log out and it should process through.

Takeaway

This post was again not really about “how to run file system replication tasks on a VNX Control Station via a script that you can leave running while you get your coffee,” though if this nails that for you I’m super happy.  This post was about demonstrating the crazy abilities that are available to you when you leverage the Linux CLI.  It is, no exaggeration, very hard to overstate how powerful this is for system, network, and storage administrators, especially considering a lot of hardware and appliance CLIs are at least Linux-like.

I would also suggest tinkering with keyed SSH as well.  I may cover this in a future post, but briefly this will allow you to establish a trust of sorts between some users on systems that allows you to remotely connect with encryption, but without requiring you to enter a password.  Several things support SSH (and keyed SSH) but don’t have the full bash CLI suite behind them – off the top of my head I know this is true for NetApp filers and Cisco switches.  Keyed SSH will allow you to run commands or scripts from a trusted Linux host, and use one-liner SSH calls to execute commands on remote hardware without having to enter passwords (or keep passwords in plaintext scripts like may happen with expect scripts).  If you can learn and leverage scripting, this is the gateway into Poor Man’s Automation.

VNX File + Linux CLI

If you can learn Linux/UNIX command line and leverage it in your job, I firmly believe it will make you a better, faster, more efficient storage/network/sysadmin/engineer.  egrep, sed, awk, and bash are extremely powerful tools.  The real trick is knowing how to “stack” up the tools to make them do what you want…and not bring down the house in the process.  Note: I bear no responsibility for you bringing your house down!

Today I was able to leverage this via the VNX Control Station CLI.  I had a bunch of standard file system replications to set up and Unisphere was dreadfully slow.  If you find yourself in this situation, give the following a whirl.  I’m going to document my thought process as well, because I think this is equally as important as knowing how to specifically do these things.

First what is the “create file replication” command?  A quick browse through the man pages, online, or the Replicator manual gives us something like this:

nas_replicate –create REPLICATIONNAME –source –fs FILESYSTEMNAME –destination –pool id=DESTINATIONPOOLID –vdm DESTINATIONVDMNAME –interconnect id=INTERCONNECTID

Looking at the variable data in CAPITAL LETTERS, the only thing I really care about changing is the replication name and file system name.  In fact I usually use the file system name for the replication name…I feel like this does what I need it to unless you are looking at a complex Replicator set up.  So if I identify the destination pool ID (nas_pool -list), the destination vdm name (nas_server -list -vdm), and the interconnect ID (nas_cel -interconnect -list) then all I’m left with is needing the file system name.

So the command would look like (in my case, with some made up values):

nas_replicate –create REPLICATIONNAME –source –fs FILESYSTEMNAME –destination –pool id=40 –vdm MYDESTVDM01 –interconnect id=20001

Pretty cool – at this point I can just replace the name itself if I wanted and still get through it much faster than through Unisphere.  But let’s go a little further.

I want to automate the process for a bunch of different things in a list.  And in order to do that, I’ll need a for loop.  A for loop in bash goes something like this:

for i in {0..5}; do echo i is $i; done

This reads in English, “for every number in 0 through 5, assign the value to the variable $i, and run the command ‘echo i is $i'”  If you run that line on a Linux box, you’ll see:

i is 0
i is 1
i is 2
i is 3
i is 4
i is 5

Now we’ve got our loop so we can process through a list.  What does that list need to be?  In our case that list needs to be a list of file system names.  How do we get those?

We can definitely use the nas_fs command but how is a bit tricky.  nas_fs -l will give us all the file system names, but it will truncate them if they get too long.  If you are lucky enough to have short file system names, you might be able to get them out of here.  If not, the full name would come from nas_fs -info -all.  Unfortunately that command also gives us a bunch of info we don’t care about like worm status and tiering policy.

Tools to the rescue!  What we want to do is find all lines that have “name” in them and the tool for that is grep.  nas_fs -info -all | grep name will get all of those lines we want.  Success!  We’ve got all the file system names.

name      = root_fs_1
name      = root_fs_common
name      = root_fs_ufslog
name      = root_panic_reserve
name      = root_fs_d3
name      = root_fs_d4
name      = root_fs_d5
name      = root_fs_d6
name      = root_fs_2
name      = root_fs_3
name      = root_fs_vdm_cifs-vdm
name      = root_rep_ckpt_68_445427_1
name      = root_rep_ckpt_68_445427_2
name      = cifs
name      = root_rep_ckpt_77_445449_1
name      = root_rep_ckpt_77_445449_2
name      = TEST
name      = TestNFS

Alas they are not as we want them, though.  First of all we have a lot of “root” file systems we don’t like at all.  Those are easy to get rid of.  We want all lines that don’t have root in them, and once again grep to the rescue with the -v or inverse flag.

nas_fs -info -all | grep name | grep -v root

name      = cifs
name      = TEST
name      = TestNFS

Closer and closer.  Now the problem is the “name   =” part.  Now what we want is only the 3rd column of text.  In order to obtain this, we use a different tool – awk.  Awk has its own language and is super powerful, but we want a simple “show me the 3rd column” and that is going to just be tacked right on the end of the previous command.

nas_fs -info -all | grep name | grep -v root | awk ‘{print $3;}’

cifs
TEST
TestNFS

Cool, now we’ve got our file system names.  We can actually run our loop on this output, but I find it easier to send it to a file and work with it.  Just run the command and point the output to a file like so:

nas_fs -info -all | grep name | grep -v root | awk ‘{print $3;}’ > /home/nasadmin/fsout.txt

This way you can directly edit the fsout.txt file if you want to make changes.  Learning how these tools work is very important because your environment is going to be different and the output that gets produced may not be exactly what you want it to be.  If you know how grep, awk, and sed work, you can almost always coerce output however you want.

Now let’s combine this output with ye olde for loop to finish out strong.  Note the ` below are backticks, not single quotes:

for fsname in `cat /home/nasadmin/fsout.txt`; do echo nas_replicate –create $fsname –source –fs $fsname –destination –pool id=40 –vdm MYDESTVDM01 –interconnect id=20001; done

My output in this case is a series of commands printed to the screen because I left in the “echo” command:

nas_replicate –create cifs –source –fs cifs –destination –pool id=40 –vdm MYDESTVDM01 –interconnect id=20001
nas_replicate –create TEST –source –fs TEST –destination –pool id=40 –vdm MYDESTVDM01 –interconnect id=20001
nas_replicate –create TestNFS –source –fs TestNFS –destination –pool id=40 –vdm MYDESTVDM01 –interconnect id=20001

Exactly what I wanted.  Now if I want to actually run it rather than just printing them to the screen, I can simply remove the “echo” from the previous for loop.  This is a good way to validate your statement before you unleash it on the world.

If you are going to attempt this, look into the background flag as well which can shunt these all to the NAS task scheduler.  I actually like running them without the flag in this case so I can glance at putty and see progress.

If you haven’t played in the Linux CLI space before, some of this might be greek.  Understandable!  Google it and learn.  There are a million tutorials on all of these concepts out there.  And if you are a serious Linux sysadmin you probably have identified a million flaws in the way I did things. 🙂  Such is life.

Sometimes there is a fine line with doing things like this, where you may spend more time on the slick solution than you would have just hammering it out.  In this made up case I just had 3…earlier I had over 30.  But solutions like this are nice because they are reusable, and they scale.  It doesn’t really matter whether I’m doing 1 replication or 10 or 40.  I can use this (or some variation of it) every time.

The real point behind this post wasn’t to show you how to use these tools to do replications via CLI, though if it helps you do that then great.  It was really to demonstrate how you can use these tools in the real world to get real work done.  Fast and consistent.

VNX2 Hot Spare Policy bug in Flare 33 .051

The best practice for VNX2 hot spares is one spare for every 30 drives in your array.  However, if you have a VNX2 on Flare 33 .051 release, you’ll notice that the “Recommended” default policy is 1 per 60.

This is a bug.  There has been no change in the recommendations from EMC.  If you want the policy to return to the recommended 1 per 30, you have to manually set it.

I noticed today when trying to do this via Unisphere that you can only set a 1 per 30 policy if you actually have 30 or more disks of a given type.  If you have 6 EFD disks, your options through Unisphere are 1 hotspare per 2, 3, 4, 5, 6, or 60 disks.  In order to set a 1 per 30 policy in this situation you must use navicli or naviseccli.

Get a list of the hotspare policy IDs:

navicli –h SPA_IP_ADDRESS hotsparepolicy –list

Set a policy ID to 1 per 30:

navicli –h SPA_IP_ADDRESS hotsparepolicy –set POLICY_ID_NUMBER –keep1unusedper 30 -o

Note that you only need to do this on SPA or SPB for each policy, not both.

I also wanted to quickly mention there isn’t a great danger in leaving this 1 per 30 because the hot spare policy is really only a reporting mechanism.  E.g. if you leave the policy at 1 per 60, and you have 60 drives, and you have two hot spares with 58 used data disks, AND you have two drives fail….both spares will kick in.  The hot spare policy does not control hot sparing behavior; it just reports compliance.  (Actually it will also prevent you from creating a storage pool that would violate the hot spare policy, but only if you don’t manually select disks…)

But I still like having the hot spare policy reflect the recommended best practice, and that is still one hotspare for every 30 disks.

Information taken from: https://community.emc.com/thread/192199