Category Archives: work

Using Powershell to get FibreChannel Information

Query Windows Systems from the CLI without relying on OEM tool-chains.

I recently posted about using systool and the /sys/class special filesystem to get FC HBA information. That’s great if you use Linux, but less helpful if you run Windows. This post will take you through the ways of getting the same information from your HBA, but on a recent versions of Windows system (where recent is defined as having WMI classes, and possibly PowerShell installed). It has to be said that I learned a lot of this from Ben Wilkinson’s post on Technet’s Scriptcenter.

First, some background. It doesn’t appear that Windows makes the HBA information available unless you’ve run the fcinfo tool, so if you can’t run this (once is enough), on every system – perhaps at install time, via your automated build system (you have one of these, right? ;-) , then you’re stuck installing the proprietary toolchain (although, if you can do that, you should probably install fcinfo at the same time..).

Once you’ve done that, then Windows has two Windows Windows Management Instrumentation (hereafter WMI) classes defined and query-able via any WMI tool (in this case we’re going to use PowerShell because its got WMI baked in, but you can get to this with Perl and many other languages and methods). WMI is definitely the way to get information out of Windows systems, the more so as they move to CMI and SMI-S, WBEM etc with Windows 8/Server 2012 etc (but that’s another whole series of blog posts and covered better by others);

We’re going to concentrate on the former here since the latter has just the WWPN’s of interest (IMHO, anyway – take a look yourself to decide – I’m not the boss of you!).

The reason I’m using PowerShell here is it really is the dogs dangly bits when it comes to working with Windows objects. Take this one-liner, run from a powershell console;

Get-WmiObject -class MSFC_FCAdapterHBAAttributes -computername MyTestServer -namespace "root\WMI" | Get-Member

This produces;


TypeName: System.Management.ManagementObject#root\WMI\MSFC_FCAdapterHBAAttributes

Name MemberType Definition
---- ---------- ----------
Active Property System.Boolean Active {get;set;}
DriverName Property System.String DriverName {get;set;}
DriverVersion Property System.String DriverVersion {get;set;}
FirmwareVersion Property System.String FirmwareVersion {get;set;}
HardwareVersion Property System.String HardwareVersion {get;set;}
HBAStatus Property System.UInt32 HBAStatus {get;set;}
InstanceName Property System.String InstanceName {get;set;}
Manufacturer Property System.String Manufacturer {get;set;}
MfgDomain Property System.String MfgDomain {get;set;}
Model Property System.String Model {get;set;}
ModelDescription Property System.String ModelDescription {get;set;}
NodeSymbolicName Property System.String NodeSymbolicName {get;set;}
NodeWWN Property System.Byte[] NodeWWN {get;set;}
NumberOfPorts Property System.UInt32 NumberOfPorts {get;set;}
OptionROMVersion Property System.String OptionROMVersion {get;set;}
SerialNumber Property System.String SerialNumber {get;set;}
UniqueAdapterId Property System.UInt64 UniqueAdapterId {get;set;}
VendorSpecificID Property System.UInt32 VendorSpecificID {get;set;}
__CLASS Property System.String __CLASS {get;set;}
__DERIVATION Property System.String[] __DERIVATION {get;set;}
__DYNASTY Property System.String __DYNASTY {get;set;}
__GENUS Property System.Int32 __GENUS {get;set;}
__NAMESPACE Property System.String __NAMESPACE {get;set;}
__PATH Property System.String __PATH {get;set;}
__PROPERTY_COUNT Property System.Int32 __PROPERTY_COUNT {get;set;}
__RELPATH Property System.String __RELPATH {get;set;}
__SERVER Property System.String __SERVER {get;set;}
__SUPERCLASS Property System.String __SUPERCLASS {get;set;}
ConvertFromDateTime ScriptMethod System.Object ConvertFromDateTime();
ConvertToDateTime ScriptMethod System.Object ConvertToDateTime();

Which is using the Get-Member commandlet to query the object produced by Get-WmiObject -class MSFC_FCAdapterHBAAttributes -computername MyTestComputer -namespace "root\WMI" to see what items and methods we can get out of the datastructure of that particular WMI namespace. This is about as deep as I’m going to go into WMI and Powershell, by the way; if you want more there are many, many good books on the subject, sometimes free from Microsoft (or coming bundled with your Technet subscription).

You’re probably thinking, that this is nice and all, but you actually want to know useful information about the computer, like what its WWNN’s are, who manufactured the card, what the firmware is and so forth. Well, if you look through the data structure above, or the link to the documentation for the WMI object, you’ll see that’s indeed what we can get.

Time for another one-liner;

Get-WmiObject -class MSFC_FCAdapterHBAAttributes -computername MyTestServer -namespace "root\WMI" | ForEach-Object { $_ }

This will provide;


__GENUS : 2
__CLASS : MSFC_FCAdapterHBAAttributes
__SUPERCLASS :
__DYNASTY : MSFC_FCAdapterHBAAttributes
__RELPATH : MSFC_FCAdapterHBAAttributes.InstanceName="PCI\\VEN_10DF&DEV_FD00&SUBSYS_FD0010DF&REV_01\\3&172e68dd&0&10_0"
__PROPERTY_COUNT : 18
__DERIVATION : {}
__SERVER : PCMEDSRV2
__NAMESPACE : root\WMI
__PATH : \\PCMEDSRV2\root\WMI:MSFC_FCAdapterHBAAttributes.InstanceName="PCI\\VEN_10DF&DEV_FD00&SUBSYS_FD0010DF&REV_01\\3&172e68dd&0&10_0"
Active : True
DriverName : elxstor
DriverVersion : 5-2.00A12
FirmwareVersion : 2.10A7
HardwareVersion : 1036406D
HBAStatus : 0
InstanceName : PCI\VEN_10DF&DEV_FD00&SUBSYS_FD0010DF&REV_01\3&172e68dd&0&10_0
Manufacturer : Emulex Corporation
MfgDomain : com.emulex
Model : FC2243
ModelDescription : HP FC2243 4Gb PCI-X 2.0 DC HBA
NodeSymbolicName : Emulex FC2243 FV2.10A7 DV5-2.00A12 PCMEDSRV2
NodeWWN : {32, 0, 0, 0…}
NumberOfPorts : 1
OptionROMVersion : 5.01A8
SerialNumber : MY10634B9K
UniqueAdapterId : 4480335924126810144
VendorSpecificID : 4244639967

..and more for each port on each Fiber Channel Host Bus Adapter present on the system

Which is nice, but not very readable. So lets select just the bits we want, and format them slightly nicer;

Get-WmiObject -class MSFC_FCAdapterHBAAttributes -computername MyTestServer -namespace "root\WMI" | Select-Object DriverVersion, FirmwareVersion, Manufacturer, Model, SerialNumber | Format-Table -AutoSize

which gives (on my system);


DriverVersion FirmwareVersion Manufacturer Model SerialNumber
------------- --------------- ------------ ----- ------------
5-2.00A12 2.10A7 Emulex Corporation FC2243 MY10734B9L
5-2.00A12 2.10A7 Emulex Corporation FC2243 MY10734B9L
9.1.7.18 3.03.25 QLogic Corporation QLA2342 L87589
9.1.7.18 3.03.25 QLogic Corporation QLA2342 L87589

Notice that I’m querying a local computer, but, firewall and user permissions permitting, I could query any system I have access to by providing a different name to the -computername parameter.

At this point, you should be in a position to understand the Get-HBA-Info function that Ben Wilkinson provided in the initial link above. Personally, I’ve got this in my .profile, so I can run it from my PowerShell prompt without having to remember the above object selections and WMI classes;


PS U:\> Get-HBA-Info MyTestServer

NodeWWN DriverVersion ModelDescription FirmwareVersion Active ComputerName SerialNumber
------- ------------- ---------------- --------------- ------ ------------ ------------
20:0:0:0:d9:69:3d:3f 5-2.00A12 HP FC2243 4Gb PCI-X 2.0 DC HBA 2.10A7 True MYTESTSERVER MY10734B10K

…and on for every fiber port in every adaptor.

Also be aware that if the computer you query doesn’t have an FC HBA or have fcinfo run at least once, you’re likely to get an error like this;

PS U:\> Get-HBA-Info MyTestNonSANServer
Get-WmiObject : Not supported
At line:9 char:22
+ Get-WmiObject <<<< -class MSFC_FCAdapterHBAAttributes -computername $Computer -namespace $namespace |
+ CategoryInfo : InvalidOperation: (:) [Get-WmiObject], ManagementException
+ FullyQualifiedErrorId : GetWMIManagementException,Microsoft.PowerShell.Commands.GetWmiObjectCommand

See Also

  • http://en.wikipedia.org/wiki/Windows_Management_Instrumentation
  • http://gallery.technet.microsoft.com/scriptcenter/Find-HBA-and-WWPN-53121140
  • “Using the Get-WMiObject Cmdlet”: http://technet.microsoft.com/en-us/library/ee176860.aspx
  • MSFC_FCAdapterHBAAttributes structure: http://msdn.microsoft.com/en-gb/library/windows/hardware/ff562495%28v=vs.85%29.aspx
  • MSFC_FibrePortHbaAttributes structure: http://msdn.microsoft.com/en-gb/library/windows/hardware/ff562510%28v=vs.85%29.aspx

Using systool to easily get HBA info and more

Query your system from the filesystem without relying on OEM tool-chains.

I often find I want to get information about the Fibre Channel Network a server is attached to, but don’t have the vendors tools installed, or its required before they’re available (such as with automated builds or on some configuration management systems before those tools have been installed). This hack will tell you more about what information you can get from a moderately recent (2.6.11 onwards, IIRC) Linux kernel, without any reliance on external tools.

First some history; The Fibre Channel Host Bus Adapter drivers used to store information in the /proc/scsi pseudo file system, but this was re-organised in 2.6.11 and now is much more centralised, under /sys, with a unified and consistent way of reporting information.
In /sys, each device on a system has its own directory with details about all the hardware devices present, and any information that the module for the device has chosen to expose to the kernel (this is dependent on the driver, as far as I can see, with some standards, but kernel developers, feel free to correct me!). For example, World Wide Node Names of a Fibre Channel PCI card are located in /sys/class/fc_host/host2/port_name, which, on my test Ubuntu system at least, is the first port on the first FC PCI card in the system.

Other files in this directory are equally useful; dev_loss_tmo, fabric_name, port_state, speed, and on some later versions of the driver, the entire statistics subdirectory!

Information like this, particularly the WWPN’s, is very useful for gathering into central databases, for use in SAN zoning and the like, and for general troubleshooting. I find I need it most often when setting up a new zone between a server and some storage device that haven’t communicated beforehand.

When you first discover this information in the /sys filesystem, if you’re like me, you start by cat’ing individual files, remembering or exploring the paths until you get to the ones you want. Then, you find the Emulex’s drivers put their files in one place, and Qlogic in another. However, the port_name file is called the same regardless, so perhaps using ‘find’ is the answer. Then you realise that some of the directories recurse in and..interesting fashion, so using ‘find’ is of no use, or at least, doesn’t return in a useful timescale.

At this point (and I admire your persistence if you got this far on your own) you might be tempted to use a bash script, or similar, with the paths hard coded. You then start getting into programatically determining the number of ports on the HBA, the number of PCI Fibre Channel cards on the system, and the fact that, unless you do a lot of spelunking on the /sys filesystem and comparing device slots (try comparing the output of lspci -v with a /sys device directory to see what I mean), you need to discount the first two devices in some kernels (local SCSI show up as /sys/class/fc_host{0,1}), and your script starts looking quite cumbersome. Add in worries about distribution portability and kernel versions and you start to despair (or was that just me) about writing anything which can run across a system beyond the one it was written on. Portability, of course, which when you need to manage more than one system, is a must.

Enter the systool command from the sysfsutils package.

This is a command that abstracts some of the details, and allows you to get just a subset of information back, without having to do all the tedious and error prone tree walking yourself. You can select via switches whether you want a class of devices, a particular device and so forth, and then tell it to display the contents (aka attributes) of one of the particular ‘files’

Lets get into the details with a few worked examples. First of all, a command to show basic Fibre Channel info (actually quite useful when you’re getting the hang of what’s on a system);

systool -c fc_host -v

which, on my test system (unconnected to a SAN fabric at the time of running this) gives;


john@brain:~$ systool -c fc_host -v
Class = "fc_host"

Class Device = "host2"
Class Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:09:00.0/0000:0a:00.0/0000:0b:00.0/host2/fc_host/host2"
dev_loss_tmo = "16"
fabric_name = "0xffffffffffffffff"
node_name = "0x500143800133ad65"
port_name = "0x500143800133ad64"
port_state = "Online"
port_type = "Unknown"
speed = "unknown"
supported_classes = "Class 3"
supported_speeds = "1 Gbit, 2 Gbit, 4 Gbit"
symbolic_name = "HPAE312A FW:v4.04.04 DVR:v8.03.07.03-k"
system_hostname = ""
tgtid_bind_type = "wwpn (World Wide Port Name)"

..and so on. I’ve trimmed some of the output for brevity, but you get the idea. You may also want to show connected WWNN’s, which you can do with (systool -c fc_remote_ports -v -d)

Since the /sys filesystem also contains information on kernel modules currently loaded, you can also query assorted information from the FC module itself;

ssystool -m qla2xxx -v

which gives;

john@brain:~$ systool -m qla2xxx -v
Module = "qla2xxx"

Attributes:
initstate = "live"
refcnt = "0"
srcversion = "5E2862BE1CA7563239F1A1E"
version = "8.03.07.03-k"

Parameters:
ql2xallocfwdump = "1"
ql2xasynctmfenable = "0"
ql2xdbwr = "1"
ql2xdontresethba = "0"
ql2xenabledif = "1"
ql2xenablehba_err_chk= "0"
ql2xetsenable = "0"
ql2xextended_error_logging= "0"
ql2xfdmienable = "1"
ql2xfwloadbin = "0"
ql2xgffidenable = "0"
ql2xiidmaenable = "1"
ql2xloginretrycount = "0"
ql2xlogintimeout = "20"
ql2xmaxlun = "65535"
ql2xmaxqdepth = "32"
ql2xmaxqueues = "1"
ql2xmultique_tag = "0"
ql2xplogiabsentdevice= "0"
ql2xshiftctondsd = "6"
ql2xtargetreset = "1"
qlport_down_retry = "0"

.. and more

This is a bit of a cheat; I happened to know that the Qlogic FC module was called qla2xxx, and hence interrogated it directly, but its not hard to Google, if you didn’t know, and its not like the word is filled with a cornucopia of FC HBA manufacturers…

However, as its sometimes hard to get firmware/version numbers out of HBA’s without the vendors tools installed (and that’s not always possible, even with configuration management tools), then this is a handy trick to know.

Some other information you can get out via this method are the ql2xmaxqdepth (that’s another post on its own ) and since it also exposes values that are set by multipath, you can use it along with multipath’s debug mode (also another post), to confirm that the values you have in multipath.conf are the ones that are set in your system.

Going back to the original use case above, that of the WWNN’s of the FC PCI card, this is the command you need;

systool -c fc_host -A port_name

which shows;

john@brain:~$ systool -c fc_host -A port_name
Class = "fc_host"

Class Device = "host2"
port_name = "0x500143800133ad64"

Device = "host2"

Class Device = "host3"
port_name = "0x500143800133ad66"

Device = "host3"

Of course, the systool isn’t limited to just Fibre Channel card information, you can get some information on any device in your system this way..

You might want to take it a step further, and define the useful variants of the command as an alias like;

alias Get-HBA-Info='systool -c fc_host -A port_name'

See Also

  • man systool
  • man lssci
  • man lspci
  • /etc/sysfs.conf
  • The homepage of sysfs; http://linux-diag.sourceforge.net/Sysfsutils.html

Storage::Nexsan::NMP

At $WORK, we need to manage a fairly large number of Nexsan Satabeast units. One of the early projects I took on, starting there, was to upgrade several hundred controllers to a later firmware. Obviously this would have to be automated, especially as the window for doing so was fairly small.

Nexsan don’t have an API, or SMI-S compatability, but they do have management capabilities via HTTP and telnet (on a non-standard port). The Telnet interface is actually a specific protocol that they’ve documented (ask your friendly Nexsan support engineer for the docs as they’re not on the website at the time of writing).

Writing a script that uploaded firmware to a SATABeast over telnet turned into writing a module that handled all the functions of the Nexsan Management Protocol (its a limited feature set, but robust for what it does do), or at least, all the ones we needed.

After some conversation, and using it in anger for a while, both $WORK and Nexsan have kindly allowed me to release the code as Open Source, thus; Storage::Nexsan::NMP and its GitHub Repository.

snmp on Ubuntu Oneric

Posting this here because I’ll forget, and it might be needed for $WORK, even with my efforts into cfengine-ing the setup.

So, after licensing issues with MIB’s, Debian Squeeze stopped shipping with all of them; which means that a whole metric fuckton are missing if you try and do anything.

It also means that snmpd will not honour the extend directive, as it can’t find the MIB’s, but without displaying any error messages in the logging. Fuckers.

The fix to this is to install the snmp-mibs-downloader package, which automates pulling them all down and putting them in the right places.

That’s wasted two evenings hacking, not to mention countless hours at $WORK SNMP probing kit, grumble..

Nagios Monitoring of Nexsan

Note; I’ve tested this on Satabeast 1.2, 2 and 2.5 as well as E60, but not all their kit. I’m also assuming you know your way around Nagios, but the core of what’s being done here (using XPath to parse an XML document) is easy to implement in other systems and languages, should you need to.

Nexsan don’t provide an SNMP interface to get stats out of their boxes, but they *do* provide some hidden stats pages that you can access via the admin account. One of the pages is an XML document that provides, amongst other things, basic performance stats. With the help of a plugin from nagios-exchange, check_http_xpath, we can parse this for the information we need.

Be aware that by doing this you’ll be downloading the entire XML each time you run a plugin to check, so you might swamp the management interface. The GUI has been known to crash in earlier versions of the firmware, so I’d recommend being on a later release (that’s just good practice though, right?).

Each controller will need this information checking, of course. Yes, this is a lot of work, and it would be easier if you could get this via a management protocol or SNMP, but we have to work with whats here, so;

Define the Nagios Check commands;

First, CPU

define command{
command_name check_nexsan_cpu_c0
command_line /usr/lib/nagios/plugins/check_http_xpath.pl -H $HOSTADDRESS$ -l ADMIN -a $ARG2$ -u /admin/opstats.asp -c '/nexsan_op_
status/nexsan_perf_status/controller[1]/cpu_percent<=95' -w'/nexsan_op_status/nexsan_perf_status/controller[1]/cpu_percent<=80'
}

This is easily modified to check c1 as well;

define command{
command_name check_nexsan_cpu_c1
command_line /usr/lib/nagios/plugins/check_http_xpath.pl -H $HOSTADDRESS$ -l ADMIN -a $ARG2$ -u /admin/opstats.asp -c '/nexsan_op_
status/nexsan_perf_status/controller[2]/cpu_percent<=95' -w'/nexsan_op_status/nexsan_perf_status/controller[2]/cpu_percent<=80'
}

So from now on I'll only show examples for c0 and leave the rest to you..

Now memory;
define command{
command_name check_nexsan_mem_c0
command_line /usr/lib/nagios/plugins/check_http_xpath.pl -H $HOSTADDRESS$ -l ADMIN -a $ARG2$ -u /admin/opstats.asp -c '/nexsan_op_
status/nexsan_perf_status/controller[1]/memory_percent<=90' -w'/nexsan_op_status/nexsan_perf_status/controller[1]/memory_percent<75'
}

Now it gets a bit tricky, as you define a command to check an RAID array's utilisation, but you don't know how many Arrays the Nexsan has. The way I've done this is with separate host-groups for array 1, 2 etc and add the appropriate host to each host-group until all of them are covered. Not ideal, but it works.

Arrays
define command{
command_name check_nexsan_array_1_load
command_line /usr/lib/nagios/plugins/check_http_xpath.pl -H $HOSTADDRESS$ -l ADMIN -a $ARG2$ -u /admin/opstats.asp -c '/nexsan_op_
status/nexsan_perf_status/array[1]/load_percent<=90' -w'/nexsan_op_status/nexsan_perf_status/array[1]/load_percent<=75'
}

Again, repeat ad nauseum per number of arrays.

Thats the most you can get out of the XML. There is a fibre channel stats page likewise hidden, but thats HTML (and not included in the XML), and it varies per controller, so I'm writing a separate Nagios plugin for each model.

Now define your Nagios Service;

define service{
use generic-service ; defined in generic
hostgroup_name nexsan-c0 ; as per hostgroup
service_description Controller 0 CPU Utilisation ; give description
check_command check_nexsan_cpu_c0!MySecretADMINPassword ; defined commandline
normal_check_interval 10 ; useful time check
servicegroups storage_performance ; which servicegroup
notification_options c,r,w ; useful defaults
contact_groups storageadmins ; who to call
}

Again, lather, rinse repeat for all your Check Commands you defined, for each controller.

Voila! Nagios is now monitoring your Nexsan. Its helpful to also put NagVis on your Nagios setup and get the output being graphed. I'm going to add this to our Cacti installation at work, so will put the code for that up when I write it as well.

Data Analysis with Open Source Tools

By Philip K. Jannert

Alas, my maths just isn’t up to this. I’m horrified at how much my calculus and algebra have declined, and they weren’t exactly my strongest suite to begin with.

I have the eBook thanks to an OReilly deal, and the Library at work bought a copy for me that I can go back to if I need to, so I plan to pick up a few of his suggested remedial reading books on Calculus and see if I can get back up to speed. In the meantime, I’ll release the dead tree version back to the Library to let them lend it to people who can make better use of it!

Powershell Munin Plugin for Windows Terminal Services

Citrix will have you believe that that only way to find out how many Citrix Users you have connected, disconnected or in total, is via thier Edge suite. This is only correct if you only want to use Citrix to get the information. After some searching, it turns out that terminal services provides, via the PerfMon performance monitor (and hence probbaly via SNMP if you know how to query PerfMon stats via that), all three of these numbers. The only difference is that the PerfMon data comes with the Admin RDP sessions as well, rather then ‘normal’ Terminal Services client users, so you may need to reduce the count by 1-2, depending on whether you leave Admin RDP sessions connected.

I spent an interesting half an hour writing this, including researching how to write a Munin plugin. Unfortunately, after doing so, I took a closer look at the Munin For Windows sample config file, and included details of how to query these values directlly, so it was all a bit moot. However, this might be usefull for someone as a template of how to do so…

TerminalServicesUsers_plugin.ps1

#import the parameters, if any
param($Argument1)

#create an empty array since powershell wont do this for us
$TSUsers = @()

##function library
#get the individual performance counters for Total, Active and Inactive sessions and add to the $TSUsers array
function Get-TSUsers
{
$tscount = Get-Counter '\Terminal Services\Total Sessions'
$var = $tscount.CounterSamples[0].CookedValue
$TSUsers = $TSUsers + $var
$tscount = Get-Counter '\Terminal Services\Active Sessions'
$var = $tscount.CounterSamples[0].CookedValue
$TSUsers = $TSUsers + $var
$tscount = Get-Counter '\Terminal Services\Inactive Sessions'
$var = $tscount.CounterSamples[0].CookedValue
$TSUsers = $TSUsers + $var
#return the values
$TSUsers
}

#respond to the munin config command - see http://munin-monitoring.org/wiki/HowToWritePlugins
function Generate-MuninConfig
{
Write-Host "graph_title Terminsal Services Users
graph_vlabel No of Users
Total.label Total
Active.label Active
Inactive.label Inactive
Total.warning 18
Total.critical 20
Active.warning 18
Active.critical 20
Inactive.warning 10
Inactive.critical 20
"

}

function Generate-MuninData
{
$TSUsers = Get-TSUsers

#dump the array values
write-host "Total" $TSUsers[0]
write-host "Active" $TSUsers[1]
write-host "Inactive" $TSUsers[2]
}

##Main code starts here
switch ($Argument1)
{
config { Generate-MuninConfig }

#otherwise, generate the output
Default{ Generate-MuninData }
}

It strikes me someone might also be interested in the Munin config I used to query this information directly;


[PerfCounterPlugin_TSTotalUsers]
DropTotal=1
Object=Terminal Services
Counter=Total Sessions
CounterFormat=double
CounterMultiply=1.000000
GraphTitle=Total TS Users
GraphCategory=system
GraphDraw=LINE

[PerfCounterPlugin_TSActiveUsers]
DropTotal=1
Object=Terminal Services
Counter=Active Sessions
CounterFormat=double
CounterMultiply=1.000000
GraphTitle=Active TS Users
GraphCategory=system
GraphDraw=LINE

[PerfCounterPlugin_TSInactiveUsers]
DropTotal=1
Object=Terminal Services
Counter=Inactive Sessions
CounterFormat=double
CounterMultiply=1.000000
GraphTitle=Inactive TS Users
GraphCategory=system
GraphDraw=LINE

Add the above to your munin-node.conf on the server, and it will produce graphs like this in the system part of the machines report;

UKUUG Spring 2011 Conference

I was lucky to be able to attend this year, as work had agreed to send me on it, and hadn’t reneged when I’d handed in my notice, although I did end up volunteering to pay for the travel and take holiday, so its only some impact for them..

Most impressed with the trains from Peterborough to Leeds – clean, on time and with free Wifi (for 15 mins, then chargeable), and power sockets at every seat.  Makes me wonder what the hell the cambridge operators are doing when the Leeds line can obviously muster this..  Also most impressed with Google maps/Navigation on my HTC Wildfire.  It directed me walking from the train station to event, and hotel, flawlessly.  Pity the battery didn’t hold out all day, though.  I really should be able to listen to a couple of hours of music, walk for 30 mins with the GPS on and make a couple of phone calls without using up all the juice.

Conference was good and increasingly attracting the DevOps crowd, which is a very good thing – preventing the conference from fading into obscurity and obsolescence.  Inspiring talks from Matthew Macdonald-Wallace and Adrian Kennard in particular, although I enjoyed the talks on Ceph, Git and DNSSEC also, and have some new projects to investigate as a result.

Thanks to Google for the excellent stationary swag, and for paying for dinner at the excellent Leeds Armoury.

Good fun, interesting talks and people and not too much intrusion into work for those attending via that.  Bloody good value.

The Practice of System and Network Administration

By Limoncelli, Hogan and Stralup

1000 pages, 6 months to read,  (paper and Safari eBook edition, about 800 and 200 pages, respectively).  I even got a Safari edition (you may have seen my justification post) primarily to read this (unfortunately, their iPad app got yanked just before I got the subscription through, so it was through the iPad web browser, which was surprisingly usable), and be able to download it as an eBook.

The book is divided into chapters covering all the different areas a sysadmin is likely to encounter, covering the basics and ‘the icing’.  I think if you could sum up one overriding principle of the book, its automate everything, with the additional push to make Systems Administration more a profession and less of a collection fo organically derived at best practices you may or may not have encountered.  It would be a fantastic start to a sysadmins career, although I suspect few companies would live up to all the aims in the book.  I bought, via work, a copy for everyone in my team, and we’ve had regular discussions about relevant chapters.  The only complaints I’ve heard levelled at it from my colleagues, or that I would make, is that its not quite as platform agnostic as it claims – most of the examples coming from the Unix world.  The other would be that it favours large IT departments/organisations.  I’m not sure how true this is, given Limoncelli for certain worked as a one man SA  for a number of years, but it seems hard to implement most of the ideas and suggestions if you’re just one person.

Its no exaggeration to say this book changed by working life.  The seeds had been started with ‘System Administration with Perl’, but it really illustrated, for me, how you could remain technical and in management, in addition to all the best practice stuff the book covers.

Talking of which, its pretty comprehensive, as you might expect from a thousand pages.  The only area I felt it didn’t cover well was in justifying IT as a business driver rather than a expense.  Limoncelli says that he prefers working for CTO’s or COO’s as they tend to see IT improvements as part of the overall business efficiency, whereas CFO’s tend to see it as a cost to be reduced.  I’ve only worked for CFO’s, but that seems broadly true in my experience.  Unfortunately, while the authors make a lot out of saying to need to justify IT to get the resources, there was little in the book to say how.  As this was one of the things I was interested in, especially as the book progressed, and the number of ideas I’d like to implement from the book grew, and hence would need resource for, it was a bit frustrating.  Plus I get the feeling from Limoncelli’s blog posts and this book, that its something he feels we as a profession should do better.  Cant disagree there!

Useful links;

Everything Sysadmin - the blog of the book, mostly updated by Limoncelli (@yesthattom on twitter)

Tomontime.com – Tom Limoncelli recorded talking about his other book ‘Time Management for System Administrators’ (and another book I’d highly recommend reading).

Sysadvent – the sysadmins advent calender, running for the past two years now.  Has some good introductory articles on the kind of automation technology the book advocates and on DevOps, a ‘new’ sysadmin movement about combining SA’s with development, and bringing the practices of one into the other.

Justifications on an online library (for work)

I had to write up a justification for purchasing a subscription to the Safari online library for work (application pending), and while I instinctively know the value of an online technical library, I was asked to justify it in terms of cost benefit to the company as the full library for 5 people is £2.5k/year.  After some thought and googling, this is what I put;

1. Consultants.  At Ł1000 per day three days of consulting will more than justify the expense.

2. Down time. We have had recent outages that could have been prevented by better system design, or reduced by access to the right manual.  One network outage of one hour costs 360 man hours at assorted rates of all the onshore and offshore staff.  

3. Cross training* - getting team members knowledgeable about other team members expertise, without sending on training courses for everything.

4. Instant library access - access to research and documentation on huge amount of subjects and softeware - we can't be an expert on everything - and we can't buy books on everything.

5. Lifelong learning - if a member of staff wants to learn about a topic, he can research this in his spare time - and has the books to hand to do so.

6. Staff motivation - with a high workload, access to resources to help do your job better may be seen to be worth Ł38/month/person

7. Reduced cost.  The average IT book is Ł20 from Amazon, and once you've bought it you can't swap it out for another one once you've read it - you can with safari.

8. Efficiency.  You can copy and past code examples, configs etc from the books in the digital library which you can't do with online versions

9. Portability - access to the library in India as easily as in UK, or at desk as easily as in machine room.

*To define "cross training". A department with many sub-specialities wants everyone to be able to handle tasks from the other sub-teams. For example, you have an IT department with three sub-teams: a Linux sub-team, a Networking sub-team, and a Window sub-team. In an ideal world, the Storage sub-team members should be able to handle 80% of the requests of the Linux sub-team, and vice versa. Being able to handle 80% of the Storage-related requests probably means knowing about 20% of what someone on the Storage sub-team knows. That's ok. 80% of the requests are probably things like add/delete/change requests (add a new virtual partition, increase the size of an existing one, etc.), and common problems (what to do with a NFS stale file handle, etc.). If everyone in the department could handle those tasks, the individual teams could focus on higher-order issues like scaling, monitoring, and optimization.

What would you have written?