GenoPro Home
GenoPro Home  |  Download GenoPro  |  GenoProX  |  Genealogy Software  |  Starting my Family Tree  |  Buy  |  Support 

The Release Day of GenoPro 2007

December 18, 2006

  1. Website deployment
  2. Hard drive failure
  3. File System Corrupted
  4. Response from Peer1 Dedicated Hosting
  5. Ordering a new server
  6. Transferring files from old disks to the new server
  7. Remounting Databases on SQL Server
  8. Configuring MailEnable with IIS
  9. The Aftermath

The first version of GenoPro was released on June 25, 1998 and the launch day for GenoPro version 2.0 was scheduled for December 15, 2006 at 7:00 am EST.   The release day of GenoPro 2.0 was anticipated by many, as the beta period of GenoPro 2.0 lasted several years.  GenoPro 2.0 had 20 major betas plus 70 sub-betas and as many private builds for our testers.

On that day, I woke up very early to get ready to deploy the new website for GenoPro 2007.  We had a brand new website with every HTML page redesigned from scratch for GenoPro 2007.  Until that day, GenoPro was known as GenoPro 2.0 and the name GenoPro 2007 had been kept secret.  Our goal for the day was to deploy the new website and spend the day relaxing, reviewing the website to correct typos and/or broken links.

After having deployed the website, we decided to test our new payment system.  My brother Jean-Claude made a successful purchase by PayPal, and it was my turn to purchase GenoPro with my credit card.  I clicked on the link to purchase GenoPro 2007, however the server responded very slowly.  Meanwhile, I had lost my connection with the remote terminal.  I also noticed several POP3 errors with Outlook.  All of the sudden, the machine rebooted.  After the reboot, we looked in the Event Log to see the following error:

The device did not respond within the timeout period.

The event was "The device \Device\Scsi\aarich1 did not respond within the timeout period".  This message is a sign of a hard drive failure, unable to read or write a file.  I ran a check disk (chkdsk.exe) and got several errors on the file system:

C:\Documents and Settings\Daniel>chkdsk
The type of the file system is NTFS.
The volume is in use by another process. Chkdsk
might report errors when no corruption is present.

WARNING! F parameter not specified.
Running CHKDSK in read-only mode.

CHKDSK is verifying files (stage 1 of 3)...
Deleted corrupt attribute list entry
with type code 128 in file 166911.
Deleting corrupt attribute record (128, "")
from file record segment 140978.
Deleting corrupt attribute record (128, "")
from file record segment 150391.
Deleting corrupt attribute record (128, "")
from file record segment 153381.
Deleting corrupt attribute record (128, "")
from file record segment 165148.
Deleting corrupt attribute record (128, "")
from file record segment 180689.
Deleting corrupt attribute record (128, "")
from file record segment 289048.
Deleting corrupt attribute record (128, "")
from file record segment 340304.
Deleting corrupt attribute record (128, "")
from file record segment 396055.
Deleting corrupt attribute record (128, "")
from file record segment 543420.
File verification completed.
Deleting orphan file record segment 140978.
Deleting orphan file record segment 150391.
Deleting orphan file record segment 153381.
Deleting orphan file record segment 165148.
Deleting orphan file record segment 180689.
Deleting orphan file record segment 289048.
Deleting orphan file record segment 340304.
Deleting orphan file record segment 396055.
Deleting orphan file record segment 543420.

Errors found. CHKDSK cannot continue in read-only mode.

Every time I would run chkdsk, I would get different errors.  Of course, I ran chkdsk using the /F option to reboot the machine, but I was still getting file system errors.

C:\Documents and Settings\Daniel>chkdsk
The type of the file system is NTFS.
The volume is in use by another process. Chkdsk
might report errors when no corruption is present.

WARNING! F parameter not specified.
Running CHKDSK in read-only mode.

CHKDSK is verifying files (stage 1 of 3)...
File verification completed.
CHKDSK is verifying indexes (stage 2 of 3)...
Deleting index entry PID2592.TMP in index $I30 of file 22861.
Deleting index entry APISCR~1.LNK in index $I30 of file 30734.
Deleting index entry log.txt.lnk in index $I30 of file 30734.
Deleting index entry LOGTXT~1.LNK in index $I30 of file 30734.
Deleting index entry MSHist012006121420061215 in index $I30 of file 30747.
Deleting index entry MSHIST~4 in index $I30 of file 30747.
Deleting index entry Dc14436.txt in index $I30 of file 33957.
Deleting index entry _7639.NEW in index $I30 of file 68902.
Deleting index entry E5625224C2264A509139B1A08DC34201.MAI in index $I30 of file 68927.
Deleting index entry E56252~1.MAI in index $I30 of file 68927.
Deleting index entry A20D7148DA574F65A520138AB3A853F4.MAI in index $I30 of file 93605.
Deleting index entry A20D71~1.MAI in index $I30 of file 93605.
Deleting index entry _198363.NEW in index $I30 of file 153023.
Deleting index entry _5811.NEW in index $I30 of file 174196.
Deleting index entry 6881CAEBDDB24375BB619F19630F039C.MAI in index $I30 of file 185580.
Deleting index entry 6881CA~1.MAI in index $I30 of file 185580.
Deleting index entry 974B2BBEE95C498891BC727DB56229A5.MAI in index $I30 of file 185580.
Deleting index entry 974B2B~1.MAI in index $I30 of file 185580.
Deleting index entry EB997F92A97F4BF69A8252E609426FDB.MAI in index $I30 of file 185580.
Deleting index entry EB997F~1.MAI in index $I30 of file 185580.
Deleting index entry _15192.NEW in index $I30 of file 185901.
Index verification completed.

Errors found. CHKDSK cannot continue in read-only mode.

After a reboot, we got the following errors in the Event Log:

Adaptec Storage Manager - Failed drive: controller 1, port 0 Adaptec Storage Manager - Logical device degraded

This time the error was from the Adaptec Storage Manager complaining about a missing drive.  Our server was running in a RAID 1 configuration, having two identical drives.  One drive was the backup of the other drive in case of a failure.  For the first time, the Adaptec Storage Manager was showing an error

Adaptec Storage Manager - Drive Failed (Port 0)

We rebooted the machine with a check disk and got another error.  This is time, it was the file system NTFS structure corrupted and unusable.

NTFS structure corrupted and unusable 

At this point, I called the technical support of Peer1.  I have been doing business with Peer1 for over 5 years, and they always had great service with a short turnaround time.  I told the technical representative about the problem suggesting a hardware problem.

A few minutes later, Peer1 called me back suggesting to replace the defective drive.  Of course, changing a defective drive would require a server shutdown.  Since this machine was running in a RAID 1 configuration, it would take several hours to copy the data from the old drive to the new drive.  What I like about Peer1 is they are quick to resolve problems and never billed me any extra, whatever it is the request for a hard reboot when the machine freezes, or clean a virus, or replace defective hardware.

I was hoping the disk replacement would solve the problems.  For sure, a bad disk is can give "timeouts" and and be responsible for file system errors.  Around 2:00 pm, I got a phone call that the drive had been replaced and about 75% of the data was copied.  It would take another hour before the machine could be operational.  Once the machine was online, I ran the Adaptec Storage Manager and noticed the device was performing a disk verification.  I decided to not do anything until the operation was complete.  The disk verification took a good two hours.

By now, it was late in the after noon, around 4:30 pm. I ran a chkdsk and got more file system errors, including file security corruption.  I am not sure if the file security corruption had anything to do with our database, but our SQL database was unable to start.  The service SQL server was reporting an "Access Denied" to the master database.

We tried many options to recover the database, including re-installing SQL server.  We had no success.  We would get disk timeouts and many errors.  We had the Adaptec Storage Manager opened and noticed both drives were inaccessible.  We took a screenshot because we had the feeling we may never see this error again.  Indeed.

Adaptec Storage Manager - Both drives are down!

A few minutes later, we lost connection with the machine.  The server had crashed.  I called Peer1 technical support requesting a hard reboot for the server (this is done by "pulling the plug" and restarting the machine).

At this time, it was already 7:00 pm and we were starving since we had little food during lunch.  I decided to take my brother out to the restaurant.  After all, it was his birthday.

While waiting for our food to arrive, we were talking about having a new machine.  We had this in mind for several months, and the file system errors made us wanted to format the drive and re-install everything.

While eating, I got a phone call from Peer1 technical support informing me they were unable to reboot the machine.  Somehow the disk controller on the motherboard was defective and they could not reboot the machine.  The technical support told me the had no extra motherboards for my type of server.  He told me they could take my order and build one for me overnight.  We quickly finished our meals and headed home to see the packages at http://www.dedicatedhosting.com/hosting/

Peer1 could give us the Basic hosting plan at $199 per month which was cheaper than what I was currently paying.  This basic hosting plan was already superior to our current hosting plan, however I wanted a hot-swappable RAID 1 configuration.  As a result, I took the Professional hosting plan with a dual hyper-threaded processor.  This hosting plan is a few extra dollars per month, however it is far more powerful and reliable since a defective disk can be changed while the machine is running.

Ordering a new server from Peer1.com

On Friday at 8:50 pm, I was on the phone with Peer1 giving the specifications such as hard disk partition and software installed for the new server.  The sales representative put me on hold to make calls to confirm they had all the hardware components to build the new machine. A few minutes later, he told me the new machine will be hosted in Miami (Florida) instead of Atlanta (Georgia) because they had the components there.  I asked if I would get the same service regarding bandwidth quality and technical support as I used to get in Atlanta.  He told me the service will be the same, as most of the technical support is done in Atlanta anyways.

By the way, Peer1 has really good reliable bandwidth.  Sometimes we take those things for granted, but when a server is missing pings reaching 50% losses on a daily basis, it is a real problem.  Before switching Peer1, GenoPro was hosted with HiSpeed.net. I had to call every week the technical guy begging to reboot his switch(es) because my server was inaccessible.  HiSpeed.net would always come with some ridicule story, but strange enough, 5 minutes after my call, the server was now accessible, yet the machine hosting GenoPro never rebooted.  Clearly it was a problem with the switch relaying the information from the server to the Internet.  Also, HiSpeed.net billed me for 22,000 GB of transfer for my first month.  HiSpeed.net was selling bandwidth for $10 per GB, making my first invoice at the comprehensive price of $220,399.  I never had the money to pay this.  At that time, I was living with my sister to save on housing costs.  Previously, I was paying $19.99 per month for hosting GenoPro.com and was asked to find another ISP because GenoPro had too many visitors.  I tried to reason to the guy at HiSpeed.net, telling him it was impossible to have that much transfer for a website like GenoPro.com.  I had compiled the logs from IIS plus an external counter proving the number of visitors was about the same as the previous month, and the previous month had about 2 GB of transfer.  Instead of using common sense, HiSpeed.net offered the solution to make multiple payments to pay off the invoice and mentioned a possibility of negotiating the amount.  It took about 3 months to get this matter resolved, after the moron admitted he mis-configured his switch.  A month later, I cancelled my contract with HiSpeed.net because I was fed up with downtime.  The guy from HiSpeed.net told me I was not allowed to cancel since I had signed a 12-month contract.  I told him he did not respected his contract because my server was always unreachable.  I had many charts to prove countless repeated long period of several hours of downtime.  I suggested him to sue me to have me pay the remaining months.  Of course, I had already switched to another ISP before giving HiSpeed.net the finger!

The sales representative from Peer1 told me the machine would be ready for Saturday morning.  I asked about the data from the old disks.  The sales representative told me they could send the hard disks by mail from Atlanta to Miami but it would take a few days.  I told him not to worry; I will call the technical support and ask them to mount those drive on another machine and give me a remote access.

Transferring files from old disks to the new server

On Saturday morning at 6:00 am, I called the technical support to know how to login to the new machine.  They told me they were finishing the configuration and the server would be ready within less than one hour.  15 minutes later, I got a call the machine was ready.  After looking at the new server, I realized I really needed to access the data from the previous disks.  We had backups of our websites and databases, but we were missing many other things such as email accounts, mail messages received during the day, SSL encryption certificates, and many other things.  Besides, we had nearly one million files to transfer from our backups.  The website http://familytrees.genopro.com alone has over 500,000 files.  Using our modem-cable Internet connection, it would take several days to upload those files to the server.  The machine is also hosting other websites, including my mom & sister's baby store (http://www.merehelene.com/) and the scrapbooking store of my aunts Claire, Sue and cousin Micheline (http://www.scrapbookerie.com/).

I called Peer1 technical support (in Atlanta) for a special request to have access the data from the old machine.  The guy told me his team will try to mount the drives on another computer and give me a password for that machine.  While at it, I asked why the new server had a network link speed at 10 Mbps Half-Duplex, instead of 100 Mbps Full-Duplex.  The technical representative told me to wait, as he will call in Miami to have an answer.  A few minutes later, he told me the network speed had been set to 100 Mbps Full-Duplex, but the machine would be offline for about 5 minutes.  (I have no idea why it requires 5 minutes downtime to change a link speed, however I can tell you the new server downloaded SQL Server 2005 from Microsoft.com at full 100 Mbps.  It took less than 3 minutes to download the whole 230 MB install package)

A few hours later, I got a phone call the machine was ready, warning me to be careful because the file system on the old drives was corrupted and unstable.  I started copying the most important files in case the machine dies. Since were transferring files by FTP, we had to compress them into a big .rar file to keep the timestamps.  The FTP protocol is slow transferring numerous files and does not keep the timestamps.  After the transfer, I took a last screenshot of the old machine.

Old server, both drives un-RAIDed

As you can see, both drives from the RAID configuration are available, but this time they are un-RAIDed, and therefore accessible under C and D.  When Peer1 rebuilt a new server, they put both,drives each having having 99.6 GB of free space, however we compressed almost 9 GB of stuff on the C drive before transferring it to the new server.  Since we had no problems with drive C, we kept drive D intact as the original backup.

The Aftermath

It is Monday and our websites are online  The old server is currently idle, yet accessible and stable.  Being curious, I asked the technical representative if they really changed the drive and he looked at the internal notes and confirmed the drive had been changed.   I think the disk change was unnecessary since the problem was related to the controller on the motherboard, but at that time, the problem appeared as a disk failure.  Nevertheless our filesystem had been corrupted and we needed a re-installing.  Why not get a new machine if we have to re-install everything.

I feel very pleased about the support I got from Peer1.com.  Usually, a big company has poor service but this it not the case with Peer1.  I got better quality and quantity of technical support than I expected.  A few weeks ago, the technical support team helped me with a virus problem, something that had nothing to do with their core business which is web hosting.  I feel confident doing business with Peer1.

By the way, I took a few screenshots of the new machine.  You can see there are 4 CPUs in the Task Manager and 2 network cards.  We plan to use the second network card (1 Gbps) to connect to a server dedicated for the SQL database.

Dual Xeon Processors Dual Network Cards

The sever crash was a great timing with the release day of GenoPro 2007.  Obviously, reconfiguring a new server was not my plans for the weekend, but in the end, everything is fine.  During the few hours GenoPro.com was available on Friday, we received emails from people wishing to purchase GenoPro 2007 and from teachers wishing to apply to our academic program for GenoPro 2007.  This is encouraging because GenoPro 2007 had never been announced before and was online for only a few hours during that day.  Before that day, GenoPro 2007 was known as GenoPro 2.0.

We plan to start working on GenoPro 2008 soon.  Our goal for this spring is creating a multi-lingual version of GenoPro.

Update, December 23, 2006

It has been a full week since we got our new server.  During the whole week, including today, we have been fixing missing shortcuts, hardcoded IP addresses, paths, passwords, and other misc configuration settings forgotten over the years.  The 80-20 percent rule applies when configuring a web server.



Academic program  |  Developers  |  Privacy  |  Contact  |  Site Map

Copyright © 1998-2024. All rights reserved. GenoPro® and the GenoPro logo are registered trademarks.