August “Home of Phoenix, Spirit, Opportunity, Pathfinder, and the Viking twins”

Posted on August 27th, 2008 in App Hosting by Daniel Raymer

August has come and gone… and not a moment too soon, I say!  This means it’s time to read Kim Stanley Robinson’s Mars Trilogy!  Uh… I mean, it’s time for the Monthly Activity Reports (MARs).

Once again, my main focus for the month has been DNS with a healthy dose of DHCP.

  1. The Diamond IP test environment was finally successfully upgraded to 3.0.53.  This will allow us to move forward with the NetID import process and get ready for DHCP roll out.
  2. Prerequisites Phase I, and Phase II steps for production migration successfully completed.  We continue to run through the process to iron out any possible "gotcha’s" that may appear.
  3. OMAPI was successfully implemented and works with Diamond IP.
  4. DNS Views testing is halted while we finalize DHCP testing.  The repeated and rapid database refreshes makes it extremely difficult to make any meaningful progress before being reset to 0.

DNS took an interesting turn this month as we finally started to do some analysis of the traffic and performance of the servers.

  1. What started out as curiosity, performance reporting on the name servers has become an automated process.  Daily total query counts and identification of the top query sources are reported to management.  Looking through this information has allowed us to identify, with sometimes very surprising results, where the majority of the DNS traffic originates.  It has also opened our eyes to some concerns of acceptable usage, too.
  2. Next on the list is the identification of top requested domains and to automate that report as well.
  3. Named stats are FINALLY being collected and are almost ripe for graphing and analysis.  This will give us a good idea of how the server is performing in more granular levels.

Not to be stuck feeling left alone, our ESX virtualization environment decided to rear its head and demand some attention.

  1. The final fault is our own, but a networking hiccup uncovered some serious design/implementation flaws in our ESX environment.  The hiccup caused a cascade of network outages which made the ESX servers to think they were in an isolated state.  High Availability did what it was supposed to and attempted to initiate migrations of the VM’s to other servers and to kill the running VM processes.  Except… almost every ESX host thought the same thing at the same time resulting in some VM’s not moving, some moving repeatedly, some shutting down, and others just getting into a confused state.
  2. Fault lies not with the software or even the network, but in the lack of paying attention to detail to insure that our virtual switches were redundant to separate physical switches and all VLAN’s were correctly tagged to their associated physical ports.
  3. Identification and repair of the situation continues.

Other miscellaneous tasks filled up the rest of the time.

  1. Power/rack moves during the weekends
  2. Operational duties
  3. Patched some of the standby and test/dev database servers.

Next month is patch happy heaven and a massive ramp up to DHCP migration.

July MARS

Posted on July 30th, 2008 in App Hosting by Daniel Raymer

Well… take 2 on this…

1.  DIP Test Environment - It’s up and running.  DHCP Migration moving forward

2.  BIND Views in DIP - Initial attempt to import a working BIND views configuration failed with tons of Java errors.  Trying another method.

3.  Moved a bunch of servers to new rack locations.

4.  Fixed various maintenance issues with list-srv1, news-srv1, Napster, and other servers.

5.  Patched the DNS vulnerability

6.  Filled in for PW on IDM - ouch…

Primary DNS is now pumping out 28,500,000 queries in a 24 hour period.

(Original version of MUCH longer - due to a crash and a time limit, this is what you get)

Άρης για το μήνα Ιούνιο (MARS for the month of June)

Posted on June 27th, 2008 in App Hosting by Daniel Raymer

I feel like a broken record….

1.  BT INS aka DiamondIP aka DIP - Test environment still not up.  Having multiple issues getting the database to work on the test environment.  Half tempted to blow the thing away and totally rebuild this test environment.

2.  Auth4 aka redundant kerb server - Built out a new kerb server to go to the hospital.

3.  Gave the Med Center an overview of DIP and the functionality.  They were made aware of the ups and downs of the product as well as given an insight on how it works.

4.  Finally got CA to get me uncorrupted patches for Spectrum One-Click.  Installed the patches and everyone is now happy that Report Manager works as it should.

5.  Lended a hand on a little forensic work for IRT.  Ended up taking much more time that I thought it would but was very much an eye opening experience.

6.  Fixed a ton of issues on RHN.  Purged a lot of non-responding servers, registered a bunch more, and finally got all the channels resync’d.

7.  Did a metric boatload (as opposed to Imperial or Standard boatloads) of DNS requests/changes/fixes/etc

8.  Managed to break DNS attempting to get BIND Views working…. GO ME!

Better pic of Lidstrom and the Cup!

Posted on June 5th, 2008 in rants & raves by Daniel Raymer

Lidstrom lifts the Cup!

That just defines AWESOME!

WINGS WIN!!!!!! WINGS WIN!!!!!!!

Posted on June 5th, 2008 in rants & raves by Daniel Raymer

Lidstrom lifts the Cup!

The team celebrates!

The Cup is back with an Original 6!

Posted on May 29th, 2008 in App Hosting by Daniel Raymer

May MARS

Well, after taking a much needed 3 weeks off in April and the first part of May, it was time to get back to the grindstone.  Now my nose is painfully sore… I blame Kevin!

So, without further delay, here are the "Wins" for May:

1.  Moved the VCMS database off of the VCMS server to the Linux Oracle server cluster  - Instead of timing out when attempting to do any performance reports greater than 3 days, we can now query and receive our reports for the past year in less than 10 seconds.  This is extremely helpful in doing trend analysis of our virtual environment.

2.  Cloned and moved the NDE server to Stevenson - The NDE webserver was cloned and moved to the SC datacenter cluster to provide redundancy of services.  Now working on getting rsync over ssh working to insure data is correctly replicated.

3.  Spectrum patching - I attempted to patch our Spectrum environment to get the Business Objects Report Manager working correctly but one of the patches was corrupted.  This prevented me from finishing the patch process.  The vendor has yet to supply the patch again for download.

4.  Moved the Diamond IP test application server and rebuild - The test application server was moved and rebuilt.  The production data was loaded on the test database and I am in the process of removing the pointers to our production DNS and pointing them towards the test application/DNS service.

5.  Develop automated Virtual Environment Billing Script - Started work on writing a script that will automatically gather the billable data (procs and ram amounts) from the VMX files for co-located VM’s.  Currently fighting what I call "Special Character Hell" to get around the multitudes of parenthesis, dashes, spaces, and slashes in VM names/directories.

6.  Continued work/enhancement of Diamond IP Production environment - Work continues on the production DIP environment to insure stability and to get external LDAP authentication working.  Additionally, RFC1918 DNS preparation continues.

Last and not least… this is the new main priority in my life right now…

Dan and Zealy

Please give a warm welcome to Zealy Caitlin!

March MARS

Posted on March 30th, 2008 in App Hosting by Daniel Raymer

Well… where to start….

I am proud to say that this should be the last month where I claim my main priority was DNS.

1.  BT INS Diamond IP successfully deployed - After some false starts, some anger, some frustration, and a whole lot of fatigue, we FINALLY rolled out our replacement DNS architecture.  The new system is running a fully integrated DNS/DHCP/IP Management solution and went in with Zero Downtime.  Out of over 29000 records and over 430 separate domains, I have received word of only 6 individual resource record errors.  Not too shabby.  Currently, we are serving up over 1.5 million queries an hour without a hiccup and the transition was transparent to the community.  Now I get to focus on getting BIND Views up and running and getting co-workers trained up.

2.  VMware Certified Professional - Yeah… it fell through the cracks and I got gigged on failing to take it prior to my review… I’ll accept that.  I will also accept that I took the test and PASSED.  Now, I need to figure out which additional alphabet soups I can append to my title… (RHCE, VCP, AEIOU, etc).

3.  The Solaris Oracle environment was patched up to current revs.  Of course, during one of the patch sessions, SunSolve decided to send its bandwidth out to lunch and make a 4 hour patch cycle take almost 10 hours.  Thanks Sun!  Also, DB-1 and DB-2 received some additional space so we don’t have to get called every time a backup of the database is kicked off.

4.  I really to tidy up some of my operational tasks/duties now that DNS is (mostly) done.

I would like to take this time to bid farewell to Kenon Ewing as he decides to play traitor and head over to the storage team from the vastly superior Unix team.  Just Kidding… I’m just jealous they are getting him and we are losing him.  He will continue to excel and his presence will be sorely missed.  Yeah, I know… he’s close enough to toss stuff at him, but still…

I would also like to take this time to give a heads up for next months MARS… it will be very incomplete/sparse… the baby is less than 3 weeks away!

See you all on the flip side!

February MARS

Posted on February 27th, 2008 in App Hosting by Daniel Raymer

Yeah, this page has been WAY neglected…

Anyway, here are the Feb MARS:

Once again, it’s all been about DNS and the preparation for the March 2nd deployment…

*  Cleaned DNS zone files which lead to a reduction from 10367 lines in the main vanderbilt.edu zone to 4271 lines.
*  Created vanderbilt.edu sub-zones to facilitate self serve requests for major users.  This resulted in another reduction of the vanderbilt.edu to 1617 lines and the creation of 18 individual sub-zones.

For the record, those 2 above activities involved manual line by line combing through the files.  Talk about time consuming and tedious…

*  Install and configure the BT INS Diamond IP appliances:  Not as easy as it sounds.  The initial build shipped with the appliances had a small error that would not allow the management stations to install from the provided USB keys.  After much hair pulling, the vendor overnighted a new build for us to use to facilitate the install.  Additionally, being locked into the non-elevated privledge accounts has led to much heartache when it comes to iptables, routing, and other low level configurations.

Enough about DNS… I actually do other things too… seriously…. stop laughing now….

*  Continued to work on getting Spectrum updated.  With the deployment of 8.1, multiple processes managed to become broken.  After multiple hox fixes and patches installed, all but one seems to be resolved.  The remaining issue concerns a bug with Business Objects and the Report Manager function for Spectrum.  With any of the Sun X-Server patches installed, BO decides to go blah.  By blah, I mean fail to work.  Computer Associates (the vendor responsible for Spectrum) is waiting on Business Objects to provide some sort of patch for this issue.

For the record once again… I don’t know HOW you did it, K-Mac, but you hooked me into BO again…  somewhere in the world, little puppies or kittens are dying because of this.

* The only other notable activity for the month (I don’t include operational gunk but it’s… well… mundane operational gunk) was the re-examination of system data collection on the servers I am responsible for maintaining.  Sar, for all of its pains and annoyances, is a steady standby for collecting system data.  SNMP and Cacti and Nagios and blah blah blah is well and good but to rely on it exposes us to potential granular data loss.  So, after spending a couple days hopping through hoops to provide CPU utilization data on systems and finding that it was (for a lack of a better term) lacking, I have jumped around and insured sar is doing its thing.

Other than that, all I have to say is "HOLY COW FEBRUARY WENT FAST!" and "HOLY COW DNS IS ALMOST HERE!"

November MARS

Posted on November 29th, 2007 in App Hosting by Daniel Raymer

YAY!  It’s that time again!

Once again, the mantra is Dee Inn Ess with Dee Ech See Pee!  Go, go Diamond IP!

*  Spec’ed out the Oracle servers for DNS/DHCP.  The servers will be beefy enough to hold multiple databases in addition to the DIP database.  Additionally, we will FINALLY have a test/dev database environment when all is said and done.  Amazing concept.
*  Got my Visio groove on and knocked out the engineering diagrams for the DIP environment.
*  Performed hardware maintenance on a couple of Sun servers.  v210’s are the devil when it comes to fans failing in their PSU’s and curse Sun for not making them modular (or even N+1).
*  Attended Sharepoint training to learn more than I ever wanted or needed to know about how to create Sharepoint sites.
*  Performed metric gathering and reporting for Owen School of Management to determine proper sizing of virtual machine CPU/Memory allocation for their spamgates.
*  Upgraded virtual memory & CPU allocation for the Sharepoint environment, jump servers, and Owen School of Management.
*  Updated the Shibboleth certificate

To update the Raymer v3 Beta 2 release…

IT’S A GIRL!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

October MARS

Posted on October 26th, 2007 in App Hosting by Daniel Raymer

This month can be summed up in 3 letters… DNS

*  Moved forward towards implementation of Diamond IP for consolidated IPAM/DNS/DHCP.  Attended training (worthwhile while painful) and now have a more complete idea of what we will need to do concerning migration, implementation, and support.  DDNS and BIND9 views will still present the largest challenges during rollout.

*  After much head pounding, replaced a NIC in an ESX server that caused a PCI reset which rebooted the server.  Great job from IBM support with that one… NOT.

*  Fought the good fight with the Software Store’s attempt to roll out the new RPEG utility on Linux using Tomcat.  Too bad the vendor was 110% Windows oriented and even admitted that any customer using Linux was basically on their own.  Punted that project to the other side of the cube farm (Sorry Scott).

*  Worked on the RHN Satellite upgrade.  Need to get the LDAP authentication working and all will be fine there.

Wow… looks like I have done a whole lot of nothing…  DNS continues to consume a majority of my time as I now dig through our somewhat messed up files.  I hope to have them prepped, cleaned, and ready for moving forward in the next couple of weeks.  Just a lot of time consuming, manual hands and eyes on keyboard jockey work.  Hopefully, this progress will pick up some steam and will start to move forward in a quicker fashion.

Oh….

And, I’m having a kid again.  Raymer v3 Beta 2 is moving forward nicely.  A lot better than the previous attempt earlier this year.  Due date is my birthday (how weird is that - Apr 19).  Hoping to find out if it is a boy or girl sometime in November.  YAY!

Next Page »