MARS 11/08

The end of October and the beginning of November have been very productive months, both for me personally, and for the SharePoint project.  We seem to have resolved our sporadic performance problem which turned out to be authentication.  I’ve also spent approximately 16 hours on PowerShell training, watching some training videos I’d purchased, and going through the labs, taking notes etc.  I then used what I learned to resolve multiple issues that have plagued us for quite some time.

Issue # 1

One problem that we’ve had for quite some time is that when a user would complain that SharePoint was slow, our only method of determining which web front end (WFE) they were connected to at the time, was to log into each WFE, and manually scroll through the security log.  This in and of itself is very problematic, and extremely time consuming due to the fact that we get over 20,000 security log entries a day.  After the first day of watching the PowerShell video’s I was able to write a simple one liner like so:

   1: get-eventlog security -newest 100 | `
   2: select-object TimeGenerated,EntryType,EventID,UserName | `
   3: where-object {$_.UserName -eq "Domain\UserName"} | `
   4: group-object TimeGenerated,Username

Note: The (`) Back tick is PowerShell’s line continuation character, I only used it here to fit the formatting width of my blog.

Which outputs the following, approximately 20-30 seconds later:

   1: Count Name                      Group
   2: ----- ----                      -----
   3:     3 11/19/2008 8:10:40 AM,... {Domain\UserName, Domain\UserName}
   4:     1 11/19/2008 7:42:13 AM,... {Domain\UserName}
   5:     2 11/19/2008 7:42:12 AM,... {Domain\UserName, Domain\UserName}

This is much more efficient than slowly scrolling through the event logs on each server.  It took tracking it down from 20-30 minutes to about 30 seconds. This simple one liner helped us determine in approximately two weeks, that every single complaint from users stating that SharePoint was slow, while others said it was performing just fine, were always on the same WFE.  The downside, is that every day, the "slow" WFE seemed to change, slowly bouncing back and forth between our three WFE servers, which made troubleshooting the issue quite difficult, but it did tell us that it was server specific for the time it occurred. 

Issue # 2

We’ve also had trouble getting a that worked, and we’ve tried several.  We had it working at one time, prior to going to .  In a nutshell, the script would parse SharePoint for a list of sites and then access each site so that the content would be loaded into memory, if this is not done, the first time a user accesses each individual site for the day, they may have to wait up to 3 minutes which is compounded by the fact that we have 3 WFE servers, so this application pool spin has to happen for every single site, on every single WFE server.

The downside, was that after going to host-based site collections, SharePoint would return all the host based sites as http: and not https:, which meant that the script would try to access the site on a port it wasn’t listening on.  After spending countless hours trying to figure it out, I finally discovered how to do it in a large part thanks to the PowerShell training videos, by simple adding a one liner into the script "$url = $url -replace("http:","https:")", replacing http: with https:, resulting in the final script looking like so:

 

   1: ######################################################################
   2: #Assumptions:
   3: #-Running on machine with WSS/MOSS
   4: #-\Common Files\Microsoft Shared\web server extensions\12\BIN in path
   5: ######################################################################
   6:  
   7: function get-webpage([string]$url,[System.Net.NetworkCredential]$cred=$null)
   8: {
   9: $url = $url -replace("http:","https:")
  10: $wc = new-object net.webclient
  11: if($cred -eq $null)
  12: {
  13: $cred = [System.Net.CredentialCache]::DefaultCredentials;
  14: }
  15: $wc.credentials = $cred;
  16: return $wc.DownloadString($url);
  17: }
  18:  
  19: #This passes in the default credentials needed.  If you need
  20: #specific stuff you can use something else to elevate basically
  21: #the permissions.  Or run this task as a user that has a Policy
  22: #above all the Web Applications with the correct permissions
  23: $cred = [System.Net.CredentialCache]::DefaultCredentials;
  24: #$cred = new-object System.Net.NetworkCredential`
  25: (”username”,”password”,”machinename”)
  26:  
  27: [xml]$x=stsadm -o enumzoneurls
  28: foreach ($zone in $x.ZoneUrls.Collection) {
  29: [xml]$sites=stsadm -o enumsites -url $zone.Default;
  30: foreach ($site in $sites.Sites.Site) {
  31: write-host $site.Url;
  32: $html=get-webpage -url $site.Url -cred $cred;
  33: }
  34: }

Issue # 3

Another issue I was able to solve, was that from time to time, our DR backup script was failing, and the only way for us to know was to check it every day.  I managed to modify it so that it now writes an event log entry, and then System Center Operations Manager will send out an alert based on the event log entry, thereby notifying us if the script fails.

 

   1: #SharePoint Catastrophic backup
   2:  
   3: Write-Output "Begin DR Backup"
   4: get-date -format g | out-file $Log -append -noClobber
   5: stsadm.exe -o backup -directory \\ServerName\ShareName -backupmethod full `
   6: | out-file $DRLog -append -noClobber
   7: Write-Output "End DR Backup"
   8: get-date -format g | out-file $Log -append -noClobber
   9:  
  10: #Write an event to the application log, based on success or failure of 
  11: #DR Backup
  12:  
  13: $DRLogEvt = get-content $DRLog
  14: if ($DRLogEvt.length -le 1)
  15:     {
  16:     $evt=new-object System.Diagnostics.EventLog("Application")
  17:     $evt.Source="SP Backup"
  18:     $infoevent=[System.Diagnostics.EventLogEntryType]::Error
  19:     $evt.WriteEntry("Backup Failed! Not enough free disk space",`
  20:     $infoevent,70)
  21:     }
  22: else
  23:     {
  24:     $c = get-content $DRLog;$l=$c.length;$Status = $c[($l-7)..$l]
  25:     $evt=new-object System.Diagnostics.EventLog("Application")
  26:     $evt.Source="SP Backup"
  27:     $infoevent=[System.Diagnostics.EventLogEntryType]::Information
  28:     $evt.WriteEntry("$Status",$infoevent,75)
  29:     }

The best part is not only does it now write an event log entry if it fails, it also writes an entry if it is successful, and cuts out the last 7 lines of the log file and places it in the description of the event, which contains the number of errors, and warnings.  So we can now easily determine what if any problems were encountered.

Issue # 4

Another issue that we’ve encountered relating to the , is that that in SharePoint there are multiple Alternate Access Mappings (AAM).  One thing that we’ve discovered is that if the default AAM is set to https: and you have multiple sites using port 443, then when you try to do a catastrophic restore it fails due to a port conflict.  Yet at the same time, if you have the Default set to http://siteulr:randomport then when you try to create a , it is successful, but kicks out some errors.  To eliminate these errors our default is set to https: which means that if we had to do a restore we couldn’t, at least not straight up. 

We’ve been doing both a catastrophic backup, as well as backing up each individual site collection.  So if we had to do a full farm restore, we’d use our catastrophic backup to restore just the SSP, and then have to manually re-create each web application, and then each site collection, and then run each site collection restore one at a time.  A process that takes at least 12 hours if not longer.  By adding the following prior to the DR backup:

   1: #Change Default Alternate Access mapping from https: to http://RandomPort
   2:  
   3: Write-Output "Begin Alternate Access mapping change https:// -> http://"
   4: get-date -format g | out-file $Log -append -noClobber
   5: stsadm -o addzoneurl -resourcename "MYSITE - 8555" `
   6: -urlzone default -zonemappedurl http:/mysite.domain.com:8555 | `
   7: out-file $Log -append -noClobber
   8: stsadm -o addzoneurl -resourcename "MYSITE - 8555" `
   9: -urlzone intranet -zonemappedurl https:/mysite.domain.com | `
  10: out-file $Log -append -noClobber
  11: stsadm -o addzoneurl -resourcename "SSP-WebApp - 8666" `
  12: -urlzone default -zonemappedurl http:/servername:8666 | `
  13: out-file $Log -append -noClobber
  14: stsadm -o addzoneurl -resourcename "SSP-WebApp - 8666" `
  15: -urlzone internet -zonemappedurl https:/servername:8663 | `
  16: out-file $Log -append -noClobber
  17: stsadm -o addzoneurl -resourcename "SharePoint Central Administration v3" `
  18: -urlzone default -zonemappedurl http:/servername:7777 | `
  19: out-file $Log -append -noClobber
  20: stsadm -o addzoneurl -resourcename "SharePoint Central Administration v3" `
  21: -urlzone internet -zonemappedurl https:/servername:77773 | `
  22: out-file $Log -append -noClobber
  23: Write-Output "End Alternate Access mapping change"
  24: get-date -format g | out-file $Log -append -noClobber

And then adding the following at the end of the DR backup:

   1: #Change Default Alternate Access mapping from http://RandomPort to https://
   2:  
   3: Write-Output "Begin Alternate Access mapping change http:// -> https://"
   4: get-date -format g | out-file $Log -append -noClobber
   5: stsadm -o addzoneurl -resourcename "MYSITE - 8555" `
   6: -urlzone default -zonemappedurl https://mysite.domain.com | `
   7: out-file $Log -append -noClobber
   8: stsadm -o addzoneurl -resourcename "MYSITE - 8555" `
   9: -urlzone intranet -zonemappedurl http:/mysite.domain.com:8555 | `
  10: out-file $Log -append -noClobber
  11: stsadm -o addzoneurl -resourcename "SSP-WebApp - 8666" `
  12: -urlzone default -zonemappedurl https:/servername:8663 | `
  13: out-file $Log -append -noClobber
  14: stsadm -o addzoneurl -resourcename "SSP-WebApp - 8666" `
  15: -urlzone internet -zonemappedurl http:/servername:8666 | `
  16: out-file $Log -append -noClobber
  17: stsadm -o addzoneurl -resourcename "SharePoint Central Administration v3" `
  18: -urlzone default -zonemappedurl https:/servername:77773 | `
  19: out-file $Log -append -noClobber
  20: stsadm -o addzoneurl -resourcename "SharePoint Central Administration v3" `
  21: -urlzone internet -zonemappedurl http:/servername:7777 | `
  22: out-file $Log -append -noClobber
  23: Write-Output "End Alternate Access mapping change"
  24: get-date -format g | out-file $Log -append -noClobber

We are now changing the default AAM to HTTP: prior to running the backup, and afterwards changing it back to HTTPS: resolving both issues.  We can now create host based sites without getting errors, and can successfully restore our catastrophic backup in its entirety, greatly reducing not only our recovery time in a failure, but the complexity of doing such a recovery.

SharePoint Performance Fix

SharePoint performance as a whole has plagued us for quite some time.  We have struggled trying to figure out exactly what was causing the slow downs.  Unfortunately we had no data that pointed us in any one specific direction.  For a while I was convinced it was Authentication, only I couldn’t prove it, either looking at network traces, perfmon data, or at netlogon logs.  However it turns out I was right.  We fixed our performance problem quite accidentally.  Kendra made a change to the SQL cluster to allow System Center Operations Manager to monitor the SQL Instances, (basically enabling Kerberos) and suddenly SharePoint page load times dropped drastically, and we have not had a single complaint about responsiveness since.  Below is an outline of exactly what changes we made:

We are currently running 2 SQL instances on our Cluster:

Domain: mydomain.com (MyDomain)

Cluster Name: Cluster01

SQL Instance 1: SQLSRV1

SQL SharePoint Instance: SQLMOSS1

  1. Create the following computer accounts:
    1. Cluster01
    2. SQLSRV1
    3. SQLMOSS1
  2. 2. Grant SQL Cluster Service Account (MyDomain\SQLClusterServiceAccount) the following rights on the computer accounts listed above:
    1. Reset Password
    2. Validated write to DNS host name
    3. Validated write to service principal name
  3. Manually add a DNS A-Record for the following:
    1. Cluster01.mydomain.com – 192.168.1.10
    2. SQLSRV1.mydomain.com – 192.168.1.11
    3. SQLMOSS1.mydomain.com – 192.168.1.12
  4. Run the following commands to create service principal names
    1. setspn –A MSSQLSvc/SQLMOSS1.mydomain.com:12345 MyDomain\SQLClusterServiceAccount
    2. setspn –A MSSQLSvc/SQLSRV1.mydomain.com:1433 MyDomain\SQLClusterServiceAccount
  5. In Active Directory Users and Computers, on the computer accounts listed below select “Trust this computer for delegation to any service (Kerberos only)” on the Delegation tab
    1. Cluster01
    2. SQLSRV1
    3. SQLMOSS1
  6. In Cluster Administrator
    1. Cluster01 -> Groups -> SQL -> SQL Server Network Name (SQLSRV1)
      1. Right-click on the network name and go to properties
      2. On the Parameters tab, select Enable Kerberos Authentication
      3. Take the SQL Group Offline. Bring it back online to test.
    2. Cluster01-> Groups -> SharePoint -> SharePoint Server Network Name (SQLMOSS1)
      1. Right-click on the network name and go to properties
      2. On the Parameters tab, select Enable Kerberos Authentication
      3. Take the SQL Group Offline. Bring it back online to test.

 

In essence what this change did, was enable Kerberos authentication for each SQL instance as well as the Cluster itself.  Once this change was made, all the SharePoint servers started authenticating with SQL via Kerberos instead of NTLM.  Microsoft recommends using Kerberos over NTLM for SharePoint as a whole, which we’re not quite at yet.  Hopefully in the near future, we will take this to the next level and finish configuring SharePoint to authenticate from end to end via Kerberos instead of NTLM.

Note: Another positive side effect of this, is the warm up script mentioned earlier now takes approximately 2 minutes to run.  Whereas, prior to this change, the script would take between one to two hours to complete. This in and of itself indicates how vast of a performance improvement switching from NTLM to Kerberos has made for our SharePoint environment as a whole.

 

KMS Issue Licensing client at Med Center

Since deploying our KMS server for both Vista, and Windows 2008 licensing, we’ve periodically run into problems licensing clients on the Medical Center side.  We’ve only encountered this issue when the clients were not members of the Vanderbilt AD Domain.  In most cases the user would just join the domain, and problem solved, or at least diverted.  Anyway I got another ticket last week about another client not able to license, initially it was hard coded for the first KMS server that we deployed which technically was a Vista desktop running in a VM, and therefore would only license Vista clients.  Because his client was hard coded he started to receive warning when his 180 days since the last communication with the old server was about to expire.  In a nutshell the issue is that the DHCP server that he receives his lease from does not define a default DNS suffix, and no suffix was defined on the client.  So when his client would query DNS for the SRV record it would query _vlmcs._tcp. and not _vlmcs._tcp.<DNS Suffix>.  Therefore his client would never find the KMS server.  Once I had the user define the "Primary DNS suffix" for his client, it registered itself in KMS immediately after rebooting.

Naturally having every user that doesn’t wish to be a part of the domain to define their "Primary DNS suffix isn’t really feasible.  The reason it works on our side and not theirs is that our DHCP servers define a connection specific DNS and theirs do not.  So I’ve opened a ticket with NetOps, to attempt to persuade them to add a DNS suffix to their DHCP scopes.

3 Responses to “MARS 11/08”

  1. interesting article

  2. Thanks, I’m glad you like it.

  3. Nice sharing… You did a well job, give us step by step process like this. Well done..

Leave a Reply




This is a captcha-picture. It is used to prevent mass-access by robots. (see: www.captcha.net)

You must read and type the 5 chars within 0..9 and A..F, and submit the form.

  

What's a blog without spam? WP-Hashcash.

Oh no, I cannot read this. Please, generate a