Tuesday, 1 February 2011

Changing drive controllers - how hard can it be?

We've got a whole bunch of HP ML310 G3 servers scattered around our organisation - they're a bit of a do-it-all server for branch offices - Windows 2003 R2, Exchange 2003, file and print, DC, DNS etc. etc.  The problem is that they're getting a bit long in the tooth (they're about 4 years old) and the storage capacity isn't all that great, and we're starting to get array members failing. Because our budget is, shall we say, slender, a full server replacement programme isn't going to happen, certainly this year.  So, what to do?

The layout has traditionally been an 80GB system disk in RAID 1, then a 250GB data disk, also in RAID 1 (though some have been upgraded to 500GB drives by copying the data to an external drive or workstation then putting it back on the new disk).  We've always used the onboard RAID controller, but it's become increasingly apparant that the controller doesn't cut it, especially in the reporting failed member stakes!  Instead of a nice orange light, you just get MFT write errors :-(

The plan is to get a proper Smart Array controller and a bunch of fresh 500GB disks and transfer the drives onto that. We could do SAS, but we stuck with SATA, as I said, the budget is fairly lean and hopefully we don't need another 4 years out of them!

I was lucky (sort of) in that one of our 310s failed a couple of years ago, and I've always kept it around for spares.  The thing that really comes in handy here is the drive bay that takes the hot-pluggable HP drives.  If you don't have one you need to be able to mount one member of each of your arrays somehow, maybe a SATA card, or a USB-SATA type cable.  You also need a copy of Ubuntu Desktop, from where y can do the cloning, and a few other neat tricks I'll show you later!

So, here we go... I waited for everyone to go home, then sprang into action.

And this was where I commited mistake number 1. When I did this the first time, I cloned the drive and it wouldn't boot.  I got "Error Loading Operating System".  Bugger.  After a bit of trawling, I found that it's a good idea to install the drivers for the Smart Array card BEFORE you clone the drive. 

Unfortunately that wasn't the only problem, and arsing about time had come to a close.  I had to go back in the morning and continue, but this time the drive wasn't ready before people turned up for work to no server.  I did a workaround to get people on, which I'll share in another post, but the upshot was that I had to start again.

As for problem number 2: it turned out that I hadn't enabled the 'large boot partition' option in the drive options, and this can cause problems.  The odd thing is that they're set for Disabled (4GB Partition) or Enabled (8GB Partition).  Given that I was going from 80 to 500GB I didn't think it would make much odds. but apparantly it does, and apparantly it can be wrong the other way too.  So in future I'll start with, then try again without if necessary.  Unfortunately the existing data is rendered inaccessible

So, for the clone itself.  I power down, then install the controller, then get the old bay, and connect it to the Smart Array controller and 2 spare Molex connections.  I fill the bay with the new drives, then power the server on. and insert the Ubuntu CD.  I need to change the BIOS so the Smart Array controller is the first in the controller order, but the CD-ROM is the first boot device.  Then, I go into the Smart Array setup and configure 2 RAID-1 arrays, the boot one with the large boot partition option on, of course.

After this configuration, Ubuntu should boot.  Once it has, go for the 'Try Ubuntu' option, though technically you don't need to select anything.  Hit CTRL-ALT-F1 and you're at a good old console screen, and you're already logged in.

Next type fdisk -l.  This will list all the disks and partitions that the system knows about.  In my case, because all my disks were present and the inbuilt RAID on the ML310 G3 is mostly software, the drives showed up twice each, /dev/sda and /dev/sdb were one array, /sdc and /sdd were the other.  I opted for sda and sdc.

You should also notice some drives without partition tables, with in my case some paths I'm not used to seeing: /dev/cciss/c0d0 and /dev/cciss/c0d1 , the likes of which I've only seen before in OpenSolaris, but there we go. I want to clone sda to c0d0 and sdc to c0d1.  As there are no problems with these source disks that I'm aware of, I go ahead and clone with dd.

sudo dd if=/dev/sda of=/dev/cciss/c0d0

But that's not going to tell me much.  I like to see what's going on! This is better:

sudo dd if=/dev/sda of=/dev/cciss/c0d0 &

The & tells dd to run in the background and let us run some other commands, but before it goes, it tells us which process it is, like this:

[1]  3912

You can then watch the process by sending a user signal, using a command like this (substitute the process number with the one that you got):

sudo watch -n10 kill -USR1 3912

Rather than killing the process,  the kill command sends a signal, which to dd means 'send your status to the display'.  If you did want to abort the clone, you can with

sudo kill 3912

All well and good, but my display was telling me that my copy was running very slowly, only about 15MB per second.  At this rate it would take about 10 hours to copy the big disk!  I knew the disks could go faster than that, and so it proved.  If you set a block size to be the same as the NTFS block size, you'll likely be in business!  It turns out dd copies in chunks of 512 bytes.  So, instead of the original command, we do this:

sudo dd if=/dev/sda of=/dev/cciss/c0d0 bs=4096 &

That's better, that got things going at about 100MB/s!  Now I'm happy with that process, I can set another going for the other disk.  If I hit alt-F2 I can do the same thing in another screen but for the other disk:

sudo dd if=/dev/sdc of=/dev/cciss/c0d1 bs=4096 &
[1]4214
watch -n10 kill -USR1 4214

I can flick between the clones using alt-F1 and alt-F2.  Of course you could always just use a terminal in the Gnome session, but I prefer this way...

Assuming all is good, you can shut down Ubuntu by typing sudo shutdown -P now .  The system will eject the CD, you take it out and hit return, then the system will shutdown after a couple of seconds.

When shut down, remove all your original disks from wherever they are connected, and power up.  If you did it right, you'll be booting Windows like a champ, and your users will (hopefully) singing your praises for those better speeds.  Well, you can only hope.

I did come across a couple of other problems while doing these:  Users wanting access and a too-small target drive (by about 100MB!), but I'll save those for another post.  That's quite enough for a first post, I think!

Howdy

Hi people,

After spending half my life online and working with computers, I've finally decided to start a blog, just on the offchance that those oddball situations I find myself in from time to time might be useful to someone else!

Cheers
CC