I've gone back and taken a peak that the Apache logs for the first whole 7 days (June 4-10) of the main Vanderbilt website. The content on www.vanderbilt.edu created by 296 authors belonging to various VU departments and related organizations. There's no need to say that the choice of tools and development styles varies greatly. The main website receives about 1.8 million requests per day. Since this data was from the first week after the migration, I was more interested in what was in there error log. So what's in the error logs?
There were nearly 30 million lines in the error logs. That means that for every request, there are 3 errors, which is incredibly high. Here are the counts:
| Total lines: |
29562819 |
| File does not exist: |
363272 |
| Directory index forbidden by rule: |
4647 |
| File permissions deny server access: |
483 |
| Premature end of script headers: |
278 |
| Htaccess options not allowed: |
6335 |
| PHP notice: |
16544472 |
| PHP warning: |
12613195 |
| ModSecurity: |
27023 |
| Miscellaneous: |
3114 |
As you can see the vast majority of the errors (98.6%) are PHP notices and warnings. The old environment did have PHP, but it did not report these types of errors. The notices are mostly scripting error relating to undefined indexes, variables, properties, and offsets. The warnings hit a variety of issues like passing the wrong data types or incorrect numbers of arguments to PHP functions, MySQL errors, or use of deprecated functions. The notices are usually harmless, but the warnings usually have an impact on the site rendering, especially the improper function calls and MySQL errors. These can be eliminated with proper variable declarations and error checking.
The "file does not exist" errors are the ever-popular 404 browser errors. These are primarily related to the outdated/unclaimed content that was removed. Approximately 25% of the original content was removed from the server, and the search engines are now discovering this. Even though nearly 25% of the content was removed, these errors only account for 1.2% of the total.
The directory index errors are related to requests for directories that do not have a suitably named index file. The accepted list of index names was carried over from the old Sun servers, but some people are not using them. The "file permissions" errors relatively obvious. The "premature end" errors are generated by Perl scripts that exit in an error state. These are probably from unused scripts that were not tested during the migration phase. the htaccess errors are related to attempts to override the system configuration.
The most interesting errors are related to ModSecurity. These were a very small percentage (0.09%) of the total errors, but they are interesting because of the request behind them. This Apache module was added to protect against vulnerabilities in web applications. It also looks out for the unwanted requests like SQL and XSS injection attempts.
The log files are available for review by the web developers. We keep a full week on hand just in case. They are located in /www.vanderbilt.edu/logs on the main server, and in similarly named directories on the other servers.