If you asked me to pinpoint a single issue that prevents successful virtual desktop rollouts, it would be disk I/O, or lack thereof. We’ve walked into a number of botched implementations where the user experience was atrocious, only to discover the storage infrastructure was not adequate to meet the desktop read and write I/O (or IOPS) requirements.
Enter Atlantis Computing. I first met the guys at Atlantis when I was at a Brian Madden event in New York a few years back. They were just starting out at the time but their technology was very compelling. In a nutshell, Atlantis’ ILIO product is designed to super-charge disk I/O in virtual desktop environments. Specifically it provides storage optimization, de-duplication, IO optimization, and caching.
While we love the Atlantis technology, one of the challenges once Atlantis is deployed is how to monitor the performance and effectiveness of ILIO as it relates to the overall virtual desktop environment. IT departments need to be able to proactively determine when storage issues are causing virtual desktop sluggishness in real time. Otherwise, you’ll hear “my desktop is slow” but not have the ability to efficiently pinpoint why. If it’s not storage, it could be a CPU or a network issue, etc. but how will you troubleshoot without being able to rule-out issues within each technical domain?
So how do we we do this? Well, Atlantis provides an excellent document that outlines how to monitor key components of the ILIO appliance. ILIO is a Linux appliance so much of the relevant performance data is based on traditional Linux/Unix commands.
Below I’ll outline a few of the basic commands used to monitor ILIO and what they measure:
- df -h /exports/ILIO_VirtualDesktops: Used to measure the amount of disk space consumed by ILIO. When coupled with the aggregate disk size of each virtual desktop that resides on ILIO, can also be used to determine disk de-duplication ratio/percentages.
- dstat -D sdb,dm-0 -dnclt. Used to measure a whole host of things including:
- vScaler Cache Performance: how much desktop I/O is being served up directly by the ILIO vScaler cache.
- Backend disk performance and offload: how much I/O is being offloaded from backend storage by the cache as well as the actual amount of I/O being sent to the backend storage.
- Network traffic and performance statistics
- dstat -full. Measures CPU and memory performance.
- iostat -xm 1. Measures CPU statistics including CPU wait time (important!) and device/disk utilization statistics
All of these commands provide us with key metrics for monitoring ILIO but the problem is that they produce data in a format that looks like this:
While using these tools to review current statistics or troubleshoot an issue in real time may be effective, it leaves much to be desired in terms of short and long term monitoring. I mean, who is going to sit around and watch these screens all day? And what about long-term trending?
Sure, you could automate collection of this data to a file via a cron job and then view and graph in a spreadsheet but how effective will that be for a daily operations staff supporting multiple systems in an enteprise environment? An effective monitoring solution should provide a mechanism for configuring real time alerts when an issue that affects user experience crops up as well as the ability review historical, trended data to perform root-cause analysis as well as future capacity planning within the environment.
What if we could dump in all the relevant data from ILIO at regular intervals and then overlay the data with relevant statistics from our virtual desktop environment? Well, you can, and we did, using Powershell, PLINK, and our favorite monitoring solution, PRTG.
Why PRTG? Besides it being a very cost effective overall monitoring solution, PRTG allows us to write custom sensors to monitor virtual any type of host or device using scripts or custom exe’s and dll’s. It’s much more extensible than many of the $100K + monitoring solutions we see in many large environment these days and we’ve made it monitor devices that don’t support traditional protocols like SNMP, WMI, etc.
Our ILIO sensor for PRTG is works as follows:
1. A PowerShell script is initiated by the PRTG monitoring server which calls PLINK.
2. PLINK logs into each ILIO, executes the relevant Linux commands described above, and pulls the data back into a logfile.
3. The PowerShell script then goes through the various log file formats and converts it into the requisite PRTG format.
We then setup the polling timers and we’re in business! Once we get some data into PRTG, this particular sensor shows us data that looks like this:
OK, so that’s better than what we had before, but once again, we want to make the format relevant to IT operations for daily monitoring. But once we have the data in a single sensor, PRTG makes it easy for us to break sensors out into individual items we’d like to see in a single chart in order to more easily make sense of the data.
ILIO Cache Performance – Amount of data delivered directly from vScaler Cache
ILIO Disk Offload – Amount of I/O being offloaded from backend storage.
ILIO Deduplication – Amount of data de-duplicated by ILIO (Disk savings!)
Tie the ILIO stats into Citrix XenDesktop or VMWARE View!
Once again, our goal is to make our monitoring data relevant and help our IT support staff monitor useful statistics. So we decided it would be nice to provide an overlay of our XenDesktop (we’re running XD in our lab but you can also do this with View) statistics with the ILIO devices that the virtual desktops run on. So we wrote another custom sensor that pulls XenDesktop stats via Citrix’s PowerShell API to get something like this:
We can then combine the XenDesktop stats with the ILIO statistics. In this case, we can see the effects of a XD boot storm on the ILIO device.
Pretty cool, huh? With PRTG and some creative Powershell scripting we can monitor most anything related to our VDI environment to ensure we meet our users’ expectations. Stay tuned for additional updates – we’ll be working on additional PRTG sensors for VDI monitoring.