MAR – Jan 2010

January 26th, 2010 by Daniel Raymer

1.  OpenDNS pilot concluded. All DHCP within Hill Center was used to assess the impact of using OpenDNS.  While some issues were noted, for the most part impact was minimal.  Results sent to leadership.

2.  The Diamond IP environment received an upgrade to 3.1.31 with minimal headaches.  A first for that environment.  DNS/DHCP remained 100% operational during the upgrade window.

3.  Diamond IP Callout Manager now generating alerts.  SNMP traps are still in progress but email alerts are going to be used in the mean time.  Base email alert functionality is working in testing.

4.  Library has been tapped for the next (and probably last) DNS/IPAM Self-Serve.  Training expected in February.

5.  Bonenjoint.com successfully transferred to Vanderbilt DNS

6.  The NDE Syslog server migrated out of 129.59.1.0

7.  The ancient Shibboleth server was decommissioned.

8.  Vandyworks.com has successfully performed multiple DR tests which required time sensitive DNS changes.

Below is the DNS statistics for the ITS hosted Vanderbilt DNS servers.

CY09 DNS Stats:
Total DNS Queries Answered    : 13,841,143,051
                IP-SRV1                        : 10,370,389,340
                IP-SRV2                        : 2,276,267,703
                IP-SRV3                        : 1,194,486,008

Total Average Daily Queries    : 37,962,338
                IP-SRV1                        : 28,446,349
                IP-SRV2                        : 6,239,242
                IP-SRV3                        : 3,276,747

Highest Month of Activity    : September – 1,877,470,808 total queries
Lowest Month of Activity    : May  – 774,095,515 total queries

And for 2009 – Facebook is the top DNS destination domain for Vanderbilt clients!

MAR – Nov 2009

November 25th, 2009 by Daniel Raymer

1.  OpenDNS pilot is in place for select wireless networks.  Once the initial pilot is completed, all of ITS will be involved in testing.

2.  Diamond IP has been upgraded from 3.0.71 to 3.1.31 in the test environment.

3.  The test database environment has been been patched and upgraded to Oracle 10.2.0.4

4.  RHN was upgraded from 5.0.1 to 5.3.  This allowed us to handle the increased clients coming from MIS/EAI and to migrate the database off Oracle 9.2.0.8 to 10.2.0.4 – freeing up some resources in the virtual environment.

5.  VUMC DNS administrators received Self-Serve DNS training and are now handling their own DNS requests.

6.  The NetID environment has been shutdown and is on-track for decommissioning the first week of December.

7.  ISIS has been scheduled for Self-Serve DNS/IPAM training.  This is the last major group identified for training at this time.  CSB & Library remain candidates and will be reviewed at a later time.

8.  Diamond IP still has some critical issues open with the vendor

  • The appliances did not handle DST changes resulting in the failure of the NTP service.
  • The DHCP servers w/ collection did not handle DST changes resulting in a massive database logging issue.
  • Agent Appliances still lose connection with the Executives resulting in lost collections and ability to publish DNS/DHCP configurations.
  •  Callout Manager & generated actions still remain to be configured to send SNMP traps on 70/80/90% DHCP subnet utilization

9.  Business hours DHCP Failover/Disaster Recovery Exercise was delayed and is now tentatively scheduled for the 2nd week of December.

Non-work related material removed

October 21st, 2009 by Daniel Raymer

In pursuant to HR Policy HR-025, all non-work related material has been removed from this blog.

MAR – Sept 2009

October 2nd, 2009 by Daniel Raymer

"Wins" for the month of September:

1. First up, the University and Medical Center now share a unified view of RFC1918 addresses, the reverse space for the Medical Center networks, and the foward zone for mc.vanderbilt.edu internally to the Vanderbilt community. This resolves a multitude of issues where data between the two organizations were out of sync causing conflicting name resolutions. Additionally, this supports the new secure relay for servers email implementation by providing proper reverse resolution for both VU and VUMC.

2. Additional departments were trained for self-serve IPAM and DNS. The Owen Graduate School of Management & the Vanderbilt University Law School now are empowered to administer their own IP and DNS space.

3. The Diamond IP environment was upgraded to to version 3.0.71 resolving some serious memory leaks present in the earlier version.

4. DHCP migrations from NetID to Diamond IP continue with the current migrations at 90% completed. This puts us well on track to have the NetID environment retired in November. That covers the major events of the month.

MAR – Aug 2009

August 26th, 2009 by Daniel Raymer

Wins for the month:

  • First off, it’s was a year ago on July 31st that we began to actually gather metrics on our DNS servers…  Looking back over the past 365 days, here are some interesting tidbits of information you may or may not care to hear…
Total Queries
IP-SRV1                9,639,738,984
IP-SRV2                2,021,500,701    
IP-SRV3                420,262,993
Total                      12,081,502,678

Per Day Average Queries
IP-SRV1                26,372,205
IP-SRV2                5,524,468
IP-SRV3                1,811,722
Total                      33,055,400

Per Hour Average Queries
IP-SRV1                1,098,842
IP-SRV2                230,186
IP-SRV3                75,488
Total                      1,377,308

Per Minute Average Queries
IP-SRV1                18,314
IP-SRV2                3,386
IP-SRV3                1,258
Total                      22,955

Per Second Average Queries
IP-SRV1                305
IP-SRV2                56
IP-SRV3                21
Total                      383

12 BILLION queries answered…
383 every second of every day on average…

That’s a lot of “Who and where is this address,” requests!

  • Speaking of DNS, we have started the process of presenting a unified internal view between the Medical Center and the University.  This will help clients in both institutions ti be resolvable within the Vanderbilt community.  In regards to email, this will go a long way to help us secure the environment and streamline our processes.  Initial testing has been successful and a roll out into production is scheduled for mid-September.

  • The DNS servers were successfully patched to address CERT Vulnerability Note VU#725188.  There was no interruption of service to the community.

  • 2 additional departments have been tapped for Self-Serve DNS/IPAM and will receive their training from ITS in the first week of September.  This is a part of the ongoing plan to empower selected departments to administer their own DNS and IP space while alleviating some of the load on ITS.

  • DHCP migrations off the old NetID environment continue to be on pace with an earlier than expected completion.  We are currently at 66% of migrations completed with little impact to the customer base.

  • The ESX environment will receive a nice boost in capability at the end of August with the addition of 8 more cores and 256GB of RAM.  The ESX development environment is slated for replacement with the current assets being redistributed to help out other production clusters.  This is an ongoing effort to increase our virtual hosting capacity and improve efficiencies.


MAR – Jun 2009

June 29th, 2009 by Daniel Raymer

"Wins" for June…

  • DNS BIND VIEWS ARE NOW IN PRODUCTION!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Sure, it took a LONG while to get in place, but it was implemented without any clients needing to change any of their configurations or any changes to all of the ICANN/ARIN registrations.  The implementation itself was relatively transparent to the University as well.  Initial RFC1918 entries are live on Internal views and will be pushing out more as the service is publicized to the University.
  • DHCP Failover with the BT INS appliances has been solved.  Well, not solved, but a functional work around for the ISC failover bug has been tested and implemented.  This will allow us to FINALLY start the migration onto the new DHCP environment and get rid of those old, clunky Sun v210s & v240s.

MAR – May 2009

April 27th, 2009 by Daniel Raymer

1. BIND Views are finally working in DIP on the primary name server. There is still an issue with getting the views propagated to slave servers. Following the ISC instructions does not work and the vendor has been engaged.

2. The issue with pushing updates in the DIP environment has been resolved. Apparently ActiveMQ was refusing to play nicely. This paves the way for upgrades to the Sapphire Appliances and to the application.

3. Disk consolidation continues in the Virtualization environment. All "troubled" LUNs have been replaced. Additionally, prep work to retire the AMD ESX servers continue. When all is said and done, the ESX environment will drop from 20 hosts down to 12 hosts.

4. Work continues on resolving backup issues with a number of hosts moving off the .1 network to the Admin Network. There are still a number of hosts that needs this addressed.

MAR – Mar 2009

March 27th, 2009 by Daniel Raymer

WINS

  • Routine system patching was performed on 8 servers.
  • One of the bastion hosts was moved out of the soon to be removed AMD Stevenson Cluster and into the Intel Stevenson Cluster
  • Backup issues were corrected on 3 servers.
  • Modification were made to the RHN database server to fix performance issues.
  • Progress was made on working with the Software Store to role out RHEL & RHN availability to the University.
  • DNS Views have been created in the Diamond IP test environment but only in the database.  See "Continuing Issues" for more detail.
  • Became ACT Online certified in 2 Information Security modules.


Continuing Issues

  • RFC1918 Compliant DNS has been stalled due to application issues that we are still working through with the vendor.  The database entries and zones/resources records for the Internal/External BIND views have been created and appears to be correct but we are unable to push out for testing until the application issue is resolved.  Hopefully, once the issue is resolved, I can move forward with aggressive testing to get this into Production shortly after Commencement.
  • We are rapidly approaching End of Warrenty on our AMD based ESX hosts.  Luckily, the Intel 7100 Cluster was built with enough capacity to support the hosts that will need to migrate.  A majority of the virtual machines on the AMD Stevenson servers have been migrated to their Intel counter-parts and major progress has been made on the AMD Hill servers.  When retirement is completed, our ESX footprint will move from the 20 current hosts down to 12 hosts.  That will take our virtual to physical ratio from the current 11.5:1 up to 19.2:1.
  • Storage capacity in the ESX environment remains constrained.  Consolidation efforts continue but until the Hill-San-6 LUN is replaced, we are maxed out in the ITS Intel Hill clusters.  When Hill-San-6 is replaced, we will be down to 800 GB of SAN space for virtual machines.

MAR – Nov 2008

November 25th, 2008 by Daniel Raymer

1.  DNS/DHCP

  • The Diamond IP environment received an upgrade to 3.0.62 in a hope to solve some issues with zone publishing.  While the software itself is stable, the problem was not solved.  Oh well…
  • The remaining 4 personnel in Application Hosting received their training on Self-Serve DNS/IPAM with the Diamond IP InControl software.  I think everyone on both the AppHosting as well as the ND&E team can agree that this is a definite Win for both teams.
  • RFC1918 subnets continue to be imported into the Diamond IP environment.  Difficulties in DHCP failover w/ the supplied DHCP 3.0.6 version from BT INS keep us from really diving into migration of DHCP enabled subnets.
  • I have exhausted my ideas on trying to get replicated DNS BIND Views implemented without using 72 hours of imports or significant name resolution service downtime.  I escalated to BT INS but have yet to get an solid answer back from them.
  • DNS survived the Great Power Outage of 08.  Sure, service was degraded a bit with the Master down, but service never dropped completely off.  YAY!

         Next up… getting those Views implemented, NCS Self-Serve DNS Training, and Sapphire 3.0.72 upgrades.

2.  The Virtual Environment

  • Virtual Center upgraded to VC2.5-Update 3 – Kendra knocked it out of the park.  Absolutely HAMMERED it.  What an awesome job by her.  I’m still wondering how some of the upgrade bugs escaped the VMware QA lab.  We ran into the issue of vxpa corrupting on 3.0.2 hosts w/ VCMS 2.5u3.  Took me a good portion of the night to figure out what was going on and how to fix it.  While it took some time and effort, the capabilities now offerred with VC2.5u3 with ESX 3.5u3 have made our life soooo much better.  And speaking of ESX 3.5u3….
  • ESX Upgrades from 3.0.2 to 3.5 Update 3 – While it is not 100% complete (the prod AMD clusters remain to be upgraded), I am going to call this a WIN as over 1/2 of the total environment is upgraded and working extremely well.  Storage VMotion has enabled us to FINALLY perform some much needed SAN consolidation.  I also happen to love the new Health Condition report within the VI Client.  I would love to say this upgrade was totally without downtime, but it was not meant to be so… of course, the downtime was pretty much our own fault.  Putting servers on local storage, lack of VMware Tools, etc.  Great job to Kendra and Scott E. for stepping up to help do the upgrades.  BIG thanks!
  • The Leviathan was pushed into production in a rushed manner to make up for the broken snapshots w/ the VCMS upgrade.  Thanks to Kenon for the quick weekend work to get the storage presented and help get the service up and running.  On a more positive note, it did push me rather forcefully into figuring out all the tricks with ESX 3.5u3 and well as getting the plugins working for VCMS.  Nothing like a little pressure to make learning so much more satisfying.

3.  RHN Upgrade

  • RHN 5.1.1 has been pushed out the door and into production.  97 of the former ITS clients are re-registered and I hope to get the other departments to finally buy into what RHN Satellite can offer them in terms of ease of deployment via kickstarts, activation keys for RHEL5, easier/quicker patching, and a view into the health of their RHEL environments.
  • RHN 5.2 was FINALLY released as well.  It came a bit too soon to the 5.1.1 production date to put too much effort into it, but that is up on the slate soon.  With Oracle 10g support (FINALLY), we can move this database to our existing, more robust clusters and gain some performance.
  • The re-registration scripts worked for the most part and made it quite easy to register.  Scripts are available for anyone in the Vanderbilt Community to take advantage of this service

By request – How to add a new LUN to RHEL using PowerPath without a reboot.

November 6th, 2008 by Daniel Raymer

Quite simple actually…

First, you need to get the HBA’s to issue a LIP and then a re-scan

  • echo 1 > /sys/class/fc_host/<host #>/issue_lip
  • echo "- – -" > /sys/class/scsi_host/<host #>/scan

Do this for every host path.

Now you just need to tell PowerPath to go do its normal discovery

  • powermt config

If you do a display, you should see the new LUN.

  • powermt display dev=all

That’s all there is to it….  go forth and fdisk!