Tag Archives: sysadmin

Using systool to easily get HBA info and more

Query your system from the filesystem without relying on OEM tool-chains.

I often find I want to get information about the Fibre Channel Network a server is attached to, but don’t have the vendors tools installed, or its required before they’re available (such as with automated builds or on some configuration management systems before those tools have been installed). This hack will tell you more about what information you can get from a moderately recent (2.6.11 onwards, IIRC) Linux kernel, without any reliance on external tools.

First some history; The Fibre Channel Host Bus Adapter drivers used to store information in the /proc/scsi pseudo file system, but this was re-organised in 2.6.11 and now is much more centralised, under /sys, with a unified and consistent way of reporting information.
In /sys, each device on a system has its own directory with details about all the hardware devices present, and any information that the module for the device has chosen to expose to the kernel (this is dependent on the driver, as far as I can see, with some standards, but kernel developers, feel free to correct me!). For example, World Wide Node Names of a Fibre Channel PCI card are located in /sys/class/fc_host/host2/port_name, which, on my test Ubuntu system at least, is the first port on the first FC PCI card in the system.

Other files in this directory are equally useful; dev_loss_tmo, fabric_name, port_state, speed, and on some later versions of the driver, the entire statistics subdirectory!

Information like this, particularly the WWPN’s, is very useful for gathering into central databases, for use in SAN zoning and the like, and for general troubleshooting. I find I need it most often when setting up a new zone between a server and some storage device that haven’t communicated beforehand.

When you first discover this information in the /sys filesystem, if you’re like me, you start by cat’ing individual files, remembering or exploring the paths until you get to the ones you want. Then, you find the Emulex’s drivers put their files in one place, and Qlogic in another. However, the port_name file is called the same regardless, so perhaps using ‘find’ is the answer. Then you realise that some of the directories recurse in and..interesting fashion, so using ‘find’ is of no use, or at least, doesn’t return in a useful timescale.

At this point (and I admire your persistence if you got this far on your own) you might be tempted to use a bash script, or similar, with the paths hard coded. You then start getting into programatically determining the number of ports on the HBA, the number of PCI Fibre Channel cards on the system, and the fact that, unless you do a lot of spelunking on the /sys filesystem and comparing device slots (try comparing the output of lspci -v with a /sys device directory to see what I mean), you need to discount the first two devices in some kernels (local SCSI show up as /sys/class/fc_host{0,1}), and your script starts looking quite cumbersome. Add in worries about distribution portability and kernel versions and you start to despair (or was that just me) about writing anything which can run across a system beyond the one it was written on. Portability, of course, which when you need to manage more than one system, is a must.

Enter the systool command from the sysfsutils package.

This is a command that abstracts some of the details, and allows you to get just a subset of information back, without having to do all the tedious and error prone tree walking yourself. You can select via switches whether you want a class of devices, a particular device and so forth, and then tell it to display the contents (aka attributes) of one of the particular ‘files’

Lets get into the details with a few worked examples. First of all, a command to show basic Fibre Channel info (actually quite useful when you’re getting the hang of what’s on a system);

systool -c fc_host -v

which, on my test system (unconnected to a SAN fabric at the time of running this) gives;


john@brain:~$ systool -c fc_host -v
Class = "fc_host"

Class Device = "host2"
Class Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:09:00.0/0000:0a:00.0/0000:0b:00.0/host2/fc_host/host2"
dev_loss_tmo = "16"
fabric_name = "0xffffffffffffffff"
node_name = "0x500143800133ad65"
port_name = "0x500143800133ad64"
port_state = "Online"
port_type = "Unknown"
speed = "unknown"
supported_classes = "Class 3"
supported_speeds = "1 Gbit, 2 Gbit, 4 Gbit"
symbolic_name = "HPAE312A FW:v4.04.04 DVR:v8.03.07.03-k"
system_hostname = ""
tgtid_bind_type = "wwpn (World Wide Port Name)"

..and so on. I’ve trimmed some of the output for brevity, but you get the idea. You may also want to show connected WWNN’s, which you can do with (systool -c fc_remote_ports -v -d)

Since the /sys filesystem also contains information on kernel modules currently loaded, you can also query assorted information from the FC module itself;

ssystool -m qla2xxx -v

which gives;

john@brain:~$ systool -m qla2xxx -v
Module = "qla2xxx"

Attributes:
initstate = "live"
refcnt = "0"
srcversion = "5E2862BE1CA7563239F1A1E"
version = "8.03.07.03-k"

Parameters:
ql2xallocfwdump = "1"
ql2xasynctmfenable = "0"
ql2xdbwr = "1"
ql2xdontresethba = "0"
ql2xenabledif = "1"
ql2xenablehba_err_chk= "0"
ql2xetsenable = "0"
ql2xextended_error_logging= "0"
ql2xfdmienable = "1"
ql2xfwloadbin = "0"
ql2xgffidenable = "0"
ql2xiidmaenable = "1"
ql2xloginretrycount = "0"
ql2xlogintimeout = "20"
ql2xmaxlun = "65535"
ql2xmaxqdepth = "32"
ql2xmaxqueues = "1"
ql2xmultique_tag = "0"
ql2xplogiabsentdevice= "0"
ql2xshiftctondsd = "6"
ql2xtargetreset = "1"
qlport_down_retry = "0"

.. and more

This is a bit of a cheat; I happened to know that the Qlogic FC module was called qla2xxx, and hence interrogated it directly, but its not hard to Google, if you didn’t know, and its not like the word is filled with a cornucopia of FC HBA manufacturers…

However, as its sometimes hard to get firmware/version numbers out of HBA’s without the vendors tools installed (and that’s not always possible, even with configuration management tools), then this is a handy trick to know.

Some other information you can get out via this method are the ql2xmaxqdepth (that’s another post on its own ) and since it also exposes values that are set by multipath, you can use it along with multipath’s debug mode (also another post), to confirm that the values you have in multipath.conf are the ones that are set in your system.

Going back to the original use case above, that of the WWNN’s of the FC PCI card, this is the command you need;

systool -c fc_host -A port_name

which shows;

john@brain:~$ systool -c fc_host -A port_name
Class = "fc_host"

Class Device = "host2"
port_name = "0x500143800133ad64"

Device = "host2"

Class Device = "host3"
port_name = "0x500143800133ad66"

Device = "host3"

Of course, the systool isn’t limited to just Fibre Channel card information, you can get some information on any device in your system this way..

You might want to take it a step further, and define the useful variants of the command as an alias like;

alias Get-HBA-Info='systool -c fc_host -A port_name'

See Also

  • man systool
  • man lssci
  • man lspci
  • /etc/sysfs.conf
  • The homepage of sysfs; http://linux-diag.sourceforge.net/Sysfsutils.html

Storage::Nexsan::NMP

At $WORK, we need to manage a fairly large number of Nexsan Satabeast units. One of the early projects I took on, starting there, was to upgrade several hundred controllers to a later firmware. Obviously this would have to be automated, especially as the window for doing so was fairly small.

Nexsan don’t have an API, or SMI-S compatability, but they do have management capabilities via HTTP and telnet (on a non-standard port). The Telnet interface is actually a specific protocol that they’ve documented (ask your friendly Nexsan support engineer for the docs as they’re not on the website at the time of writing).

Writing a script that uploaded firmware to a SATABeast over telnet turned into writing a module that handled all the functions of the Nexsan Management Protocol (its a limited feature set, but robust for what it does do), or at least, all the ones we needed.

After some conversation, and using it in anger for a while, both $WORK and Nexsan have kindly allowed me to release the code as Open Source, thus; Storage::Nexsan::NMP and its GitHub Repository.

Powershell Munin Plugin for Windows Terminal Services

Citrix will have you believe that that only way to find out how many Citrix Users you have connected, disconnected or in total, is via thier Edge suite. This is only correct if you only want to use Citrix to get the information. After some searching, it turns out that terminal services provides, via the PerfMon performance monitor (and hence probbaly via SNMP if you know how to query PerfMon stats via that), all three of these numbers. The only difference is that the PerfMon data comes with the Admin RDP sessions as well, rather then ‘normal’ Terminal Services client users, so you may need to reduce the count by 1-2, depending on whether you leave Admin RDP sessions connected.

I spent an interesting half an hour writing this, including researching how to write a Munin plugin. Unfortunately, after doing so, I took a closer look at the Munin For Windows sample config file, and included details of how to query these values directlly, so it was all a bit moot. However, this might be usefull for someone as a template of how to do so…

TerminalServicesUsers_plugin.ps1

#import the parameters, if any
param($Argument1)

#create an empty array since powershell wont do this for us
$TSUsers = @()

##function library
#get the individual performance counters for Total, Active and Inactive sessions and add to the $TSUsers array
function Get-TSUsers
{
$tscount = Get-Counter '\Terminal Services\Total Sessions'
$var = $tscount.CounterSamples[0].CookedValue
$TSUsers = $TSUsers + $var
$tscount = Get-Counter '\Terminal Services\Active Sessions'
$var = $tscount.CounterSamples[0].CookedValue
$TSUsers = $TSUsers + $var
$tscount = Get-Counter '\Terminal Services\Inactive Sessions'
$var = $tscount.CounterSamples[0].CookedValue
$TSUsers = $TSUsers + $var
#return the values
$TSUsers
}

#respond to the munin config command - see http://munin-monitoring.org/wiki/HowToWritePlugins
function Generate-MuninConfig
{
Write-Host "graph_title Terminsal Services Users
graph_vlabel No of Users
Total.label Total
Active.label Active
Inactive.label Inactive
Total.warning 18
Total.critical 20
Active.warning 18
Active.critical 20
Inactive.warning 10
Inactive.critical 20
"

}

function Generate-MuninData
{
$TSUsers = Get-TSUsers

#dump the array values
write-host "Total" $TSUsers[0]
write-host "Active" $TSUsers[1]
write-host "Inactive" $TSUsers[2]
}

##Main code starts here
switch ($Argument1)
{
config { Generate-MuninConfig }

#otherwise, generate the output
Default{ Generate-MuninData }
}

It strikes me someone might also be interested in the Munin config I used to query this information directly;


[PerfCounterPlugin_TSTotalUsers]
DropTotal=1
Object=Terminal Services
Counter=Total Sessions
CounterFormat=double
CounterMultiply=1.000000
GraphTitle=Total TS Users
GraphCategory=system
GraphDraw=LINE

[PerfCounterPlugin_TSActiveUsers]
DropTotal=1
Object=Terminal Services
Counter=Active Sessions
CounterFormat=double
CounterMultiply=1.000000
GraphTitle=Active TS Users
GraphCategory=system
GraphDraw=LINE

[PerfCounterPlugin_TSInactiveUsers]
DropTotal=1
Object=Terminal Services
Counter=Inactive Sessions
CounterFormat=double
CounterMultiply=1.000000
GraphTitle=Inactive TS Users
GraphCategory=system
GraphDraw=LINE

Add the above to your munin-node.conf on the server, and it will produce graphs like this in the system part of the machines report;

UKUUG Spring 2011 Conference

I was lucky to be able to attend this year, as work had agreed to send me on it, and hadn’t reneged when I’d handed in my notice, although I did end up volunteering to pay for the travel and take holiday, so its only some impact for them..

Most impressed with the trains from Peterborough to Leeds – clean, on time and with free Wifi (for 15 mins, then chargeable), and power sockets at every seat.  Makes me wonder what the hell the cambridge operators are doing when the Leeds line can obviously muster this..  Also most impressed with Google maps/Navigation on my HTC Wildfire.  It directed me walking from the train station to event, and hotel, flawlessly.  Pity the battery didn’t hold out all day, though.  I really should be able to listen to a couple of hours of music, walk for 30 mins with the GPS on and make a couple of phone calls without using up all the juice.

Conference was good and increasingly attracting the DevOps crowd, which is a very good thing – preventing the conference from fading into obscurity and obsolescence.  Inspiring talks from Matthew Macdonald-Wallace and Adrian Kennard in particular, although I enjoyed the talks on Ceph, Git and DNSSEC also, and have some new projects to investigate as a result.

Thanks to Google for the excellent stationary swag, and for paying for dinner at the excellent Leeds Armoury.

Good fun, interesting talks and people and not too much intrusion into work for those attending via that.  Bloody good value.

The Art of Capacity Planning

By John Allspaw

I bought this about the same time as we set-up a Nagios & Munin system (and by we I mean Gatekeeper), but didn’t get round to reading it until just recently.  As I kinda suspected, it mainly emphasises getting good information and knowing what you want the system to do, but explained very clearly and logically.  There was also some good information on automated build systems, something I’m looking into recently.

I was looking for more information on the kind of statistics maths you need, but apart from some frustrating hints about using excel and linear regression (unusually with no links to find out more info), it wasn’t expanded on.

Ultimately, well written, a good introduction but I’d have liked more hard info on calculating metrics, which is ultimately what I bought it for..

Automating System Administration With Perl

By David N. Blank-Edelman

This is probably the best system administration book I’ve read, and certainly the best, most quintessentially ‘OReilly’ book, in a long long while.  Its safe to say it restarted my love of both perl and system administration.  Not bad for a purely technical book.

Blank-Edelman takes the OReilly approach of assuming you’re intelligent and just don’t know the subject matter.  In this book, he includes appendices such as ‘the 5 min RCS guide’ or ‘the 20 min XML tutorial’ to supplement the chapters that deal with sysadmin topics that use those areas.  In all of the chapters he gives not only the subject matter, but where you can find out more, and crucially, enough understanding that you can troubleshoot the problem yourself, knowing where to start.  This is an invaluable technique, and very hard to get right.  If he uses advanced Perl techniques or modules, he clearly documents what and why.

Along with Tim Limoncelli, and Jesse Vincent he’s one of my new Tech heroes.

I started reading this just before Fred was born, so at over 700 pages it took some time to get through, but it was interesting and entertaining, even reading this on a black and white screen of my eBook reader!  Thanks goodness I bought it in eBook format too – its possible to read with a baby in one arm, but not a 700 page technical tome – unless its an eBook..