Recently I was the tech lead on a project at the company that I work for where we were doing a software upgrade. Upgrade is such a vague term in any IT organization and in this case the upgrade project was very extensive. The application that we upgrade must remain secret but it is a tier 1 application, an application with very strict uptime requirements. There were lots of changes and a long project so I thought this would make a nice case study on how to do the implementation of an application upgrade.
Environment Before Upgrade
Citrix 3.0 farm
Version 3 of app
Database of Oracle 9i on Windows
Environment After Upgrade
Citrix 4.0 farm
Version 6 of app
Database of Oracle 10G on Unix
The application was running smoothly before our upgrade and during the upgrade and testing process we needed to iron out some bugs with a new app install process and validate the application as there was rogue code that was added to get the application originally into production three years ago. During the testing phase over 6 months we went through three service packs and then three full cycles of user testing and at least ten very documented installs so as not to get into any problems during a long day to upgrade to production.
Citrix farm upgrade
As far as the Citrix farm upgrade went everything went very smoothly as the changes from Metaframe 3.0 to 4.0 was not very difficult.
We were very fortunate in the upgrade to Oracle 10g on Unix as we inherited unused hardware from another project which allowed us to run the upgrade many times with no problems, the best new feature that we were able to take advantage of was Flahback, a new utility in Oracle that allowed us to quickly move back to our original data if there was any problem with the load of the upgrade scripts. The old Windows environment was very stable but being able to move our staging to it’s own environment was a real help for performance even though we stressed to users that there were no performance testing guarantees in staging.
Day of upgrade deployment
The day that we did the upgrade was a Sunday and the upgrade because of changes to the application, Oracle and to a lesser extent Citrix made for a really long outage to the application. The upgrade was scheduled to take 12 hours and it took just that amount of time. Although you always have unforeseen issues when you do an upgrade we knew how long every step would take and did take. One of the great things that we used was a conference bridge throughout the day so that everyone would be able to communicate at any time to the entire upgrade team. The upgrade was a success and was really never in doubt thanks to extensive testing.
Challenges met and passed
Looking back is the best way to make sure that a large upgrade project will come together well. One of the reasons that the upgrade took so long was that there were so many changes to the infrastructure that changed in the one upgrade.
We considered doing the Citrix change first, then a few weeks later the Oracle on Unix change, then finally a few weeks later the application change. I would have preferred having a few shorter outages to do this instead of taking a Tier 1 app down for a whole day and also we would have reduced the risk to implement. This tactic was turned down by the project manager as he was more comfortable with having no question around the changes even if there were a lot of them all at the same time. Still not too happy with this approach for the future but it did certainly worked.