MARS 08/09
I finished moving the SharePoint Test farm behind the F5, and setup SSL termination on the F5. Some of hte service monitors are not ideal, but as soon as I have some more time to spend on it, I should be able to get things setup properly. Then I can document the process, and start prepping to move the production SharePoint farm behind the F5.
I spent several days researching some blue screens that were generated on several of the ECS servers. Initially we were not getting memory dumps. We discovered two issues that were preventing the servers from creating memory dumps.
- The first issue is that the page file was not on the system partition. In order for the system to generate a User, or Kernel dump, an adequately sized page file must exist on the system partition.
- The second issue, was that the system partition was not large enough to place the entire page file on the system partition, so getting a User dump was not an option. We could still get a Kernel dump, but that required us to make the following registry change, so that we could have a small (smaller than physical RAM) page file on the system partition:
- HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl\ Create a new DWORD named IgnorePagefileSize, with a value of 1.
After making the necessary changes we received our first memory dump on EM24. After analyzing this dump, we were able to determine that an unkown driver had modified the PrintDbg routine. Then I went through all the ECS servers looking for the same stop code, adn was able to determine that 3 different ECS servers had experienced 8 BSOD’s with the same stop code the first starting on 6/30/09. Next we tried to track down what drivers were installed either on 6/30 or shortly beforehand. We could not find any documented changes for driver updates on the three impacted servers. So the decision was to replace the 3 impacted VM’s with new VM’s.
Leave a Reply