Greg and I had a long and fruitful meeting with RE this morning about upcoming rebuilds and reconfiguration of the TAPoR boxes. This is the basic plan:
- In the short term, RE will collapse the two DB servers into one, freeing up one of the more powerful machines. Greg is currently going through dbs on the servers to determine what obsolete ones can be dumped, to simplify the process. If there's time, they'll all roll onto one server, then the other will be rebuilt with a 64-bit OS, and they'll be moved to that one, freeing the first; if time is a problem, then they'll just complete the first migration.
- The freed-up db server will be rebuilt as a 64-bit Tomcat box, and will house (initially) two Tomcats, "legacy" and "active". Both will end up with more memory than the current running Tomcats.
- All current Tomcat projects will be migrated from Lettuce to the new server, with the current OLD Cocoon (from Tomcat Prod) going into Legacy, and all the other projects going into Active.
- When MJ has finished his porting work, the Graves and Abstracts projects will be installed on the new Tomcat server in Legacy, and links and virtual domains re-pointed. Then there are a couple of other services running on Mustard which need to be moved (eg Mongrel/Ruby). Those can be copied to Lettuce. Once that's done, Mustard will be ready for a rebuild.
- Mustard will be rebuilt with the latest Apache, PHP, etc., and brought up as a "cluster of one"; existing PHP/web projects will be copied to it from Lettuce, and tested. A side benfit of this is that all apps running on our cluster will have the advantage of the wildcard SSL cert., currently only available under hcmc.uvic.ca
- The cluster of one will go live, and Lettuce will then be brought down for a rebuild, as a clone of Mustard. Then it will be added to the cluster, so we have true failover for our web projects.
Other issues discussed
A problem has recently arisen with the NFS machine (arugula) where backups are taking nearly 12 hours, likely due to insufficient RAM. RE feels that there might enough spare RAM available to him to add up to 2GB to arugula - this will hopefully alleviate the problem. While down for the RAM we might also install RHEL5 and the new TSM client. If so, the downtime is likely to be a full day.
Currently, we have homedirs in the same filesystem as application dirs (like www). RE suggested that we look at splitting homedirs from app dirs so that our *mission critical* filesystems are as small as possible. We would also deploy the archive fs previously discussed as soon as enough space is freed up to make this possible. This would create at least 3 permanent filesystems: home, app and archive.
We also made a request for a script that will allow us to sudo run it and change permissions on certain dirs to make them writable by other users - very handy to have this functionality when deploying apps and data for other users.
/home1t/ is currently just about full. We also have requests coming in to expand the filesystem. We discussed with RE the best way of managing this unfortunate combination of events, and we arrived at a potential solution. Advice has been offered to concerned parties and we will wait for a response before proceeding.