Virtual Infrastructure
Redundant Networking Switches in the High Density Rack
The high density racks that house the ESX servers were equipped with one 48 port Gig switch, which is a single point of failure. We eliminated this risk by adding a second switch to the high density racks, and creating a switch stack. Each of the switches has a 1 GB uplink to the routers in the datacenter. In order to add the switch and create a second redundant uplink for the switch stack, we had to take a network outage on each high density rack respectively. So we Vmotioned all VM’s in one high density rack to the other, took the network hit, then Vmotioned back and took the network hit on the other rack. The equated to 0 downtime for any VM’s that were running during this time.
Patch from 3.0.1 to 3.0.2
Since we had to Vmotion all of the VM’s from one host to another we also took advantage of this to patch the environment as well. A few hours before the switch change mentioned above, we Vmotioned all VM’s from one high density rack to another, and patched the ESX servers from ESX 3.0.1 to ESX 3.0.2. Once the first high density rack was complete, the switch change started for the ESX servers in the same high density rack (since their VM’s had already been Vmotioned for the patching change). We then Vmotioned all the VM’s back and took the network hit on the other rack. Once the network had been re-established for that high density rack we proceded to patch those ESX servers to 3.0.2.
ITS-HCVM09 Memory Failure
This ESX server had been reporting memory errors. When I called IBM support, they informed me this may be a firmware update for the baseboard controller (BMC). So when we Vmotioned the VM’s off of this server and rebooted it, they were 4 GB of memory not being detected. We called IBM support back and had them immediately send out a technician with some memory. Upon arrival, the IBM technician realized that there were also problems with the baseboard so they replaced that as well. Upon reboot, the server was seeing 32GB memory again and all was well.
Backup Server
Networking Configurations
The backup replacement server arrived, and we began preparations on changing backup servers and OS environments (switching from Solaris to Linux) The backup team had requested a special networking configuration for the backup server and the previously deployed backup storage node. The desired config would provide them with a 3GB uplink to the router, which would require 3 1 GB NIC configured as one 3GB pipe. To accomplish this we used the industry standard protocol IEE 802.ad or dynamic link aggregation. In this protocol, the ports on the switch work in tandum with the NIC’s on the server via an algorithm that sends the next backup to the most available NIC. Not only to you get a 3GB pipe out of this configuration, but you also get redundancy at the NIC level (of course if 1 NIC fails the pipe drops to 2GB rather than 3GB). Once we worked this configuration out on the new backup server, we reto-fitted it into the backup storage node configuration and saw the benefits as were able to push 246 Mbps, which was more throughput than we have seen out of this environment.
SAN Connectivity
The new backup server has a requirement for 4 paths to SAN switches. Two paths are for the storage fabric, and the other two paths are for the backup fabric. There are three HBA’s in this server two single port HBA’s and one dual-port HBA. One of the single port HBA’s is on the storage fabric, the other single port HBA is on the backup fabric. One port on the dual port HBA is on the backup fabric, the other on the storage fabric. This provides us with not only redundant paths to the SAN, but redundant HBA connections to the SAN for each individual fabric. We also chose to go with MPIO for failover rather than EMC powerpath. Altough EMC powerpath is a good tool, it introduces problems when it is time to patch a server, especially the kernel. Since MPIO is a native tool the RedHat Enterprise Linux, it will not have the same complications when patching the server.
Identity Management
LDAP Integration
There has been a lot of effort to provide a highly available LDAP infrastucture to support mission critical Medical Center applications. We have been tasked with researching the capabilities of our CSM (Load Balancing Module) to investigate the possibilities of load balancing LDAPS protocol. Since CISCO doesn’t have a pre-defined method for the LDAPS protocol, we have to write a script in TCL to accomplish this task. CISCO provides a SSL TCL script as well as a LDAP script, but they do not provide a LDAPS script. So we are now looking into getting a package called tls and ldapx into the CSM so we can take advantage of these pre-defined classes for doing writing LDAPS TCL scripts.
We also had discussions about how to architect the integrated LDAP solution. In a coordinated effort between ITS and MIS, we proposed an 8 server 4 geographical location solution that seems to be the best configuration for what we are trying to achieve.
RHN
Been working with our DBA to establish a tiered configuration for RHN. This entails a Oracle Database server running Oracle 9i, and a server running RHN. We also decided to upgrade our version of RHN 4.2 to RHN 5 to support the release of RedHat Enterprise Linux 5. The database has been successfully created and the application can attach to it. However, we have to install RHN 4.2 on the new server point it to an export of the RHN 4.2 database that is now running on the database server, then upgrade 4.2 to 5.0 via a RedHat package that will update the database schema as well as the application.