Responsibility Transfer: I have been gradually moving out of my role as Unix Team Lead and System Administrator. Almost all of the daily operational duties are being performed by the System Administration team.
Exchange Support: The vast majority of my time this month has been spent work with the ECS team to support the new Exchange environment. The consolidation process is very complex, and it requires coordination from many people within the Vanderbilt community. The Exchange 2007 environment has been my re-introduction into the world of Microsoft Windows. I have been involved recently in monitoring server performance.
Enterprise Linux Reference Platform: We are continuing to meet and define the base RHEL standard.
New Position: I have recently been promoted to Senior System Administrator. As part of the transition, I am moving out of my operation role, and I am transferring those duties to the rest of the Unix team. I’m now supporting each of the AppHosting teams in my new role. I’m already diving into a couple of issues on the Windows platform, which is something I’ve not been heavily involved with for a while. As such, I’ll be trying to absorb as much information as possible in the near future.
Exchange 2007 Support: I have been tasked with several items supporting the Exchange 2007 environment.
Sun Identity Management [SIDM]: The new SIDM LDAP services are live, available, and in-use by the general Vanderbilt community. The team is in the process ensuring that all clients are properly migrated to the new service. Only a few clients are still connecting to the Solaris-based service, and we are working with the owners to move them over.
Enterprise Linux Reference Platform [ELRP]: The team has produced a server build with minimal packages to use as the base system. We are in the process of determining the proper configuration for the installed packaging.
[WIN] Sun Identity Management: The new Epassword LDAP service went into production this morning. This new service offers LDAP service on new geographically-diverse Linux systems which are load balances by our F5 LTMs. The new LDAP service is more resilient in the event of a server outage. It also allows ITS to decommission some aging Sun hardware.
[WIN] F5 Disaster Recovery Test: Prior to the cut over to the new LDAP service, we performed a full disaster recovery test to simulate loss of service in the Hill Center. Our tests were successful, and we did learn a few new things about the behavior of the LTMs. The successful disaster recover test validates the architectural planning the went into the services deployed behind the LTMs.
I only have one major WIN this month…
PHP5/MySQL5 Web Server Migrations: The project to upgrade all of the shared web services to RHEL5 is essentially complete. All of the content has been migrated, and the sites are running smoothly. This migration gives the web community access to PHP5 and MySQL5. It also brings along a much more robust ModSecurity ruleset. So far the vast majority of the help desk tickets have been related to applications with the new ModSecurity rules.
Sun Identity Management: The networking issue for the virtual machines located in the VUH data center has been resolved. Our intermittent network issues were cause by a shared IP address between F5 LTMs and a Cisco router. This caused the VMs to lose outbound network connectivity when the IP jumped to the router. The IP address was removed completely from the rouer, and network connectivity has been restored. We are moving on the final testing phase. We are working through the final steps for getting SLAMD setup.
[WIN] ITS Website Redesign: The cutover to the new ITS website is scheduled for this weekend. The change will involve exporting the test database, cleaning up all of the links, and importing the data on the production server. I’ll also need to copy the necessary files over to the production server. The last step will involve moving the IP addresses from the old servers over to the new servers. The CSM will pass the incoming connections to the new servers.
PHP5/MySQL5 Upgrades: The cutover deadline for the cut over of the first round of server has been extended until mid-June. This round of migration is progressing slowly. There are 58 folders left on the WWW4 servers 29 virtual hosts left on the vanity domain servers. This remaining content is under review by the site owners and awaiting migration approval.
[WIN] RHEL5/PHP5/MySQL5 Upgrades: The migrations for the FEVS02 (misc domains) and FEVS06 (www4) servers pairs have begun. The first attempt to do this was interrupted by another work activity, and they were rescheduled for Apr 26. The rescheduled migrations were completed without issues.
ITS Website Redesign: The final cutovef date has been scheduled.
Sun Identity Management: All of the LDAP virtual machines have been built and are accessible via our bastion hosts. The networking issues with the F5 LTMs have been resolved, and this appears to have revealed a possible issue with the ESX servers behind them. I have opened a ticket with the VUH helpdesk, and I am working with their staff to resolve the issue.
Security Investigations: I spent several days working with the Network Security team to identify and mitigate a security issue.
[WIN] F5 Load Balancing: All of the F5 LTMs (2 test and 5 production) are online and available for use. Mark Dycus and I spent a couple of day in the test lab trying various configurations. We were able to duplicate the loop and broadcast/multicast storm issues that we were experiencing. We were able to create a configuration that met are needs but also did not degrade to an unstable state if a connection dropped. The Hill and Stevenson LTMs are scheduled for a minor configuration change.
Sun Identity Management: All of the ELDAP and CLDAP VMs are now online. I am working through verifying connectivity to each VM and service point. I have also been alerted to some latency issues with the LTM, and I am in the process to debugging this. I have reconfigured the secondary LTM in a slightly different configuration for comparison testing. The best working configuration will a synced over the the other peer.
ITS Website Redesign: This project is on track. All of the content is being reviewed for accuracy and relevance. Some minor presentation issues are also being addressed.
PHP5/MySQL5 Upgrade: All of the customers for this migration have been identified. The specifications for each servers are also being collected so that each service pair is appropriately sized.
Exchange 2007: I am working with the primary project resources to ensure that the new Exchange 2007 services are available via the LTMs. This configuration is slightly different from our preferred architecture. The new Exchange servers are not directly behind the LTM, and the connections are being proxied over to the servers. This implies some additional configuration to get the connection to go through. The only deteriment to this configuration is that the source client IP address is lost due to the source NATing that takes place.
Shibboleth Rebuild: I am working with the primary project resources to ensure that the new Shibboleth servers are available via the LTMs. I have recommended that the recently built test Shibboleth server be moved behind the test LTMs to ensure that testing procedures are valid.
Unix Team Cross Training: The Unix team has expanded our weekly one hour meeting from our typical status updates and info distribution. We are now attempting to work in cross training, technology demos, and team project support. During our last meeting, we covered the basic UltraSeek admin tasks, OSSEC capabilities and agent install, DiamondIP zone creation, and patch administration. This is an effort to eliminate operational bottlenecks that develop when the primary admin is the defacto expert for a particular technology.
Sitemason Stabilization: This month the Sitemason service grew to utilize five web servers behind the F5 LTM. The service is still suffering from intermittent issues such as intermittent login failures or XML parsing failures. This is also the first service that I have written an iRule for. The rule that I wrote will redirect the user to a “service not available” web page if the Sitemason pool fails as a whole. This was a learning experience, and I created the rule using various code snippets that I found on the F5 DevCentral website.
Sun Identity Management: Both of the ESX servers are fully operational in the VUH data center, and all of the virtual machines have been configured for the new network location. Both of the F5 LTMs are online; however, I only have limited network connectivity to the VIPs that I have created. I’m working with the VUH staff to diagnose the connectivity issues.
ITS Website Redesign: All of the virtual hosts for the miscellaneous ITS websites (except the main ITS website) have been moved to other servers in preparation for the final jump to the Drupal-based site. The last major step is the content reviews.
Web Service Resource Allocations: I increased the disk space allocation for the WWW and WWW4 web server pairs. Content growth has been fairly steady on both services, and the Nagios service checks were starting to alert at warning level. The MySQL server VM also received more memory and disk space.
MySQL InnoDB Reconfiguration: The existing InnoDB tables on the shared MySQL server were stored in a single file, which made it difficult to determine which customers were consuming an inordinate amount of resources. It took about two hours to re-import all of the databases using InnoDB tables. Some of them had to be imported twice due to corruption issues from an unknown source.
Cohosted Server Patching: The Greeklife, Honor Council, and VICC websites were upgraded to current RHEL standards. The vmware-tools were also updated.
Exchange 2007 Deployment: I have done miscellaneous tasks in support of this project such as configuring the service point VIPs on the F5 LTM and debugging connectivity issues. The CAS VIPs also has a pair of minor iRules to forward connections.
Server Patching: All of the shared web servers have been patched, and the co-hosted web servers are scheduled.
Sun Identity Management: I am continuing my work to bring the VMs online in the VUH data center, but I’m being diverted by operational events and tasks. Both ESX servers are fully online and accessible after the latest network revision. We are in the process of implementing slamd to load test the new service. The test MIS Business Objects servers are now connecting to the new service.
ITS Website Redesign: All of the ITS-manage virtual hosts have been migrated from the old servers to the new servers, except for the main ITS website and the change management website. The Drupal-based ITS website is still undergoing content revisions. The change website will be migrated in the near future. Cutover to the Drupal-based site is pending completion of the content revision.
Nagios Data Merge: The data from the old Nagios server has been merged into the archives on the new Nagios server. This process involved translating server and service check names between the two servers. All of the monitoring data is now on the new server.
Sitemason Enhancements: We have put a lot of effort into identifying performance issues with the Sitemason service and implementing potential fixes. The database server has been reconfigured to handle an increased connection volume. The too many clients errors were somewhat sporatic and do not always coincide with the MyVU traffic. We have built and are currently testing a clustered configuration.
Identity Management: The F5 LTM hardware in HIll and Stevenson is in production. I was able to correct the bug in the LDAP monitor by directly modifying the LDAP configurations on LTMs. This is a correction that is not possible through the web GUI. The
ITS Server Migration: Two new RHEL5 virtual servers have been deployed, and the first of the ITS websites has been moved over. swdist.vanderbilt.edu has been transitioned over. More websites will be transitioned over in the near future.
Tertiary Web Presence: A business continuity website has been established for the main Vanderbilt website by utilizing services from Rackspace. This was was given to the team (aloing with business continuity DNS) as a special work effort to complete the implementation.
Backup Client Upgrades: The Unix team has completed upgrading the Legato client on our servers. This enables for active management of the backup process for the storage team.