June Activity Report

18 06 2007
  • This week marked the installation of the CDL (Clariion Disk Library) that Evan and I worked on.  A major challenge with the installation of this hardware was the persistent device binding on Linux.  Our storage node runs Linux and again we were confronted with the udev problem.  Udev is the facility by which the Linux 2.6 Kernel with Emulex HBA’s keeps a static list of devices.  Without appropriate udev rules the devices shuffle themselves every time you reboot the server.  Especially in our environment where three of the drives are shared this is not a desired behavior.  The biggest problem with the CDL and udev rules was that we did not have a unique WWN to map in our rules, the only unique identifier we had available were the assigned serial numbers from the CDL.  The following is the actual rule set that solved the problem for us.
    • KERNEL=”nst*”,BUS=”scsi” SYSFS{vendor}=”IBM”, SYSFS{model}=”ULTRIUM-TD2″, PROGRA
      M=”/sbin/scsi_id -g -s /class/scsi_tape/nst%n”,RESULT=”3500104f0005ecd95″,SYMLIN
      K=”fc_tape_nst0″
      KERNEL=”nst*”,BUS=”scsi”, RESULT=”3500104f0005ecd8f”, SYMLINK=”fc_tape_nst1″
      KERNEL=”nst*”,BUS=”scsi”, RESULT=”3500104f0005ecd8c”, SYMLINK=”fc_tape_nst2″
      KERNEL=”nst*”,BUS=”scsi”, RESULT=”3500104f0005ecd9b”, SYMLINK=”fc_tape_nst3″
      KERNEL=”nst*”,BUS=”scsi”, RESULT=”3500104f0005ecd92″, SYMLINK=”fc_tape_nst4″
      KERNEL=”nst*”,BUS=”scsi”, RESULT=”3500104f0005ecd86″, SYMLINK=”fc_tape_nst5″
      KERNEL=”nst*”,BUS=”scsi”, RESULT=”*7RA1400206*”, SYMLINK=”fc_tape_nst6″
      KERNEL=”nst*”,BUS=”scsi”, RESULT=”*7RA1400207*”, SYMLINK=”fc_tape_nst7″
      KERNEL=”nst*”,BUS=”scsi”, RESULT=”*7RA1400208*”, SYMLINK=”fc_tape_nst8″
      KERNEL=”nst*”,BUS=”scsi”, RESULT=”*7RA1400209*”, SYMLINK=”fc_tape_nst9″
      KERNEL=”nst*”,BUS=”scsi”, RESULT=”*7RA140020A*”, SYMLINK=”fc_tape_nst10″
  • The second major task of this month was the rollout of the second Cisco 9509 Fibre Channel Director.  The first problem I came across was an issue with power on the existing director Lucas.  This issue ended up being a known bug on our version of the SANOS (3.0.2) which can cause the system to incorrectly think that it has less power available to the chassis than it is using for the blades it has.  This causes the system to think that available power for new devices is 0 (since it does not report negative power.)  This effected us by not allowing us to power up a new 24 port blade that was to be our first step in initiating an ISL between Lucas and Nimoy.  Cisco recommended a bug fix that solved the problem and we saw no outage as a result.  The rest of the migration involved moving a 16 port blade over to the new director, organizing cables, and establishing a second ISL between Lucas and Nimoy through the 16 port blades for redundancy.  One unintended consequence of our new environment is that the ISL to Stevenson now exists off two directors.  This is good however you cannot extend an ISL port channel across multiple directors.  So instead of two 2G ports trunked together, we have two 2G ports that are separate for redundancy.  This may slow down our RMSE sessions from Exchange.  All in all the change went very well, however we did take away a few new lessons.  One being that while redundant communication paths on the SAN Fabric does provide two communications paths for data traffic, it will only use one determined by a shortest path first algorithm and then switch to the other path when necessary using the same algorithm.  This means bugs in the ESX environment such as the read-only file system bug seen on guest Linux VM’s will still be effected even if a redundant connection on the SAN side appears to be up at all times.
  • I have also been involved in the initial phases of Vuspace 3 (Vuspace Strikes Back).  I helped gather data for our project this month in order to present to governance for funding.  I hear we were approved so we should soon be making more progress on this front.
  • Finally, with Evan, Kevin, and Jeff all out at the same time this week I had a nice sink or swim exposure to our Backup Environment.  I have to say this past week I learned quite a bit about how our environment works.  I feel comfortable now managing it and I believe we (Evan and I) are going to start switching job responsibilities back and forth on a weekly basis.  This way we both get full exposure to our Storage environment.
  • One more thing :)   This month I performed our first data migration on the BlueArc platform.  The facilities for this are very nice.  I was able to set up a date and time for replication to occur and I moved Mike McCaughey’s data from one file system to another.  The process took approximately 4 hours to move 1TB of data, it uses NDMP on the backend, pretty spiffy.
  • And I’m off … I’ll be on vacation for about 10 (consecutive) days starting this Thursday.  So I’ll see everyone in a bit.