Re-Engineering Storage

Our client was experiencing a long duration in generating business activity reports from their production Linux System. At the time it was running Red Hat 4.6 with an iSCSI connected EMC CX300 SAN.

When a member of the management team selected a report it could take upwards of three minutes to generate and output the report back to their web browser! While this was occurring the system was also servicing user web activity, staff usage and background job processing.

Prior to our involvement, a previous contractor had been asked to improve performance. They added a few more 15K 300G byte SAS disks into the array and reported that there was nothing more that could be done.

Examination of the server showed a standard disk layout using LVM to manage storage, however the three Volume Groups consisted of a single large Physical Volume (a LUN from the SAN). On the networking side there were two HBA cards in the server and two 1G switches (this is pre 10G switches) and the ethernet settings had been tweaked to allow Jumbo packets.

On examination, we decided to create some new volume groups and instead of single large LUNS, we sliced up smaller blocks from the new disks, added them to new Volume Groups. The plan was to move the data off the original LUN, and free up those disks which we then sliced into smaller RAID-5 groups so instead of one big RAID-10 group presenting a single LUN, we had 4 separate RAID-5 groups providing smaller LUNs. This would allow more IO queues across more physical disks.

From the new Volume Groups, new Logical Volumes were created, however we stripped them over the four Physical Volumes in each Volume Group. The theory being that each LUN had a service queue, and if there were more queues, then more data could be written and read at the same time.

The result paid off and report generation and the return of data to the user was now around 30-45 seconds! A significant improvement.

In this case, using a series of RAID-5 disk groups rather than a single mirrored stripe provided better performance due to the nature of the applications being used. Most data over the Ethernet was less than a standard packet size so 9K packet size settings had no benefit what so ever.

Our next refactoring effort would require moving the Database to its own server and discuss refactoring the application so parts of it could run on other systems, lessening the load on the main production server.

Recover a crashed Linux 8.10 VM

Randomizing CRON start times in Saltstack

Managing LVM Storage