Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
”Websites have been a hobby of mine since I was about 15, and over time I've started to do some fairly advanced things to take care of simple, usually annoying, problems I encounter. This section is intended to serve as a reminder to myself of those tricks, and possibly of use to others as well.
It's really useful to be able to watch the error log from the webserver when you are developing a site, it'd be even more useful if you could be alerted when a new error has been logged.
tail -f error.log | grep -v --line-buffered favicon.ico | sed 's/$/\x07/'
First, open the log and follow appends to it, then filter this through grep and remove any lines with “favicon.ico” in them, then replace end of lines with bell characters to make the console beep/alert us on each new line.
NB: It is important to use the “–line-buffered” swtich with grep
or the next pipe will recieve no data (or at least recieve data in unusefully large chunks, making the data old).
This one is a weird problem I came accross when trying to add some fancy url rewriting to a site — the plan was to have flat links such as “/properties/Tala Hill” remap to either “/properties/Tala Hill.php” if it exists, or to “/search.php?cat=properties&q=Tala Hill”. However, I couldn't get to step one - for some reason “/properties/blahblah” was always being mapped to “/properties.php” even when URL rewriting was off!.
The culprit? mod_negotiation
. I noticed that the HTTP headers from the server would have the following unusual entries when I requested “/properties/Tala Hill”:
Content-Location: properties.php Vary: negotiate TCN: choice
and this would only happen if adding “.php” to the end pointed to an actual file.
Commenting out mod_negotiation and it's settings (LanguagePriority and ForceLanguagePriority) from my apache configuration fixed this and the request now results in 404 Not Found as it should. So, back to playing with the Rewrite Engine
The upshot of the problems I encountered in the previous section is that I discovered the Content-Disposition
header which can be used to turn a normal page into a file download - i.e. you can present an HTML / text page (which would normally be displayed by the browser) which the user is prompted to “Save as…”.
See http://www.faqs.org/rfcs/rfc2183.html for details.
When developing a website, it is common to have two separate servers: development and live. However, problems arise when you are upgrading an existing side to a new, incompatible one. I use the trick below to over come this.
We're going to change our browser to report a different name (technically “User Agent”) to any website it connects to, and make sure the server displays a “Down for Maintenance” banner for any request which does not report to be our special browser.
For the following example I have a plain HTML file called DownTime.html
sitting in the site's root directory, and will be setting my browser to report that it is “RobM”, instead of it's usual one1).
First we'll set the server to redirect everyone to a notice page about the maintenance, we'll take care of letting ourselves in once we know the public are seeig this notice.
Note: You need to have mod_rewrite
and some level of “AllowOverride” enabled in your Apache configuration. If in doubt try (on your development server) setting “AllowOverride All” for your site to allow .htaccess
files to set all directives possible
Add the following lines near the top of your site's /.htaccess
on the live server:
## Enable the URL rewriting engine RewriteEngine on Options +FollowSymlinks ## Send non-developers to a "site down for maintenance" page RewriteCond %{HTTP_USER_AGENT} !^RobM.*$ RewriteRule .* DownTime.html [L] ## Normal .htaccess content below this line ...
Notes
+FollowSymlinks
in order for mod_rewrite to work.%{}
refers to a server variable. This is akin to $_SERVER
in PHPHTTP_USER_AGENT
will contain whatever was sent in the User Agent: XXXX
line your browser sent, if it sent one.!^$
is a negated regluar expression.!
→ “If not matched”^
→ Very beginning of variable (as opposed to being flexible like regular expressions usually are)$
→ Very end of variable!^RobM$
will be satisfied for any user-agent which is not exactly “RobM” (case-sensitive)RewriteRule
will replace one part of the requested URI with another.*
→ Replace the whole thing (it's a regular expression which matches everything)DownTime.html
→ means the request now becomes /DownTime.html
, even if the original request was for /subdir/subdir2/page.ext?param=value¶m2=value
[L]
→ “Last” rule to consider. Normally mod_rewrite
would continue checking for applicable rules right up to the end of the file - which'd mean if you had, say, a rule to convert “*.html → *.htm” that DownTime.html would become DownTime.htm and ultimately break.I hear Opera browser has built in support for changing the User Agent, so you can just use that. In Firefox you'll want the User Agent Switcher extension. I don't care about Internet Explorer, so if you use that you're on your own2).
So I've now changed my User Agent to “RobM” and set it as current.
==
So with some luck, the general public will be seeing DownTime.html for any request them make, and you'll be seeing the site as “normal”. To allow the general public back in simply comment out the two rules (and possibly the lines that enable mod_rewrite
if you don't need them). You'll probably want to revert your User-Agent too, or some sites will give you very basic pages3).
Just a quick aside related to the above — when you move your content into a wiki everyone's old bookmarks will break and search engine indexes will become out of date and possibly other nasty things will happen too. I found that using Apache's RewriteEngine was a very neat way to overcome this without having lots of the old file-system structure lying around with minimal redirect pages in them:
RewriteRule ^Events.html$ menu/eventsi RewriteRule ^Events_Schedule.php$ events/schedule RewriteRule ^Events_Schedule_Past.php$ events/schedule_past RewriteRule ^Events_Schedule_Static.php$ events/schedule_static RewriteRule ^Events_SocRunnings.php$ events/general RewriteRule ^ArtMedia.html$ menu/art_media RewriteRule ^ImproManga.php$ art_media/impromanga ... ## DokuWiki use_rewrite handler RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule (.*) wiki/$1 [QSA,L]
What this does is compare the left-hand side regular expression against the URL request, and change any matching part to the right-hand side when it can; so by the time it get to the bottom of the file the DokuWiki rewrite directives recieve a request for a (virtual) wiki page.
While using lighttpd
's simple virtual hosting I found myself wanting a simple way to define per-host configurations, but there apparently wasn't one available. So I wrote a script in Python which /etc/lighttpd/lighttpd.conf
runs to acquire the vhost-config entries:
Add the following to /etc/lighttpd/lighttpd.conf
:
## Load per-vhost configurations include_shell "/var/www/lighttpd-vhost-confs.py"
And create a new file at /var/www/lighttpd-vhost-confs.py
:
#!/usr/bin/python """ Generates per-vhost configuration directives from conf files found in the root of each vhost Add the following line to /etc/lighttpd/lighttpd.conf to make use of this script: include_shell "/var/www/lighttpd-vhost-confs.py" """ import os, sys basedir = os.path.dirname(sys.argv[0]) dirlist = os.walk(basedir).next()[1] for dir in dirlist: conf = os.path.join(basedir, dir, "lighttpd.conf") if os.path.exists(conf): f = file(conf) conf_data = f.read() f.close() print """$HTTP["host"] == "%s" {\n%s\n}""" % (dir, conf_data)
The directory layout this was used in was as follows:
/var/www /var/www/disk-browser /var/www/disk-browser/html /var/www/lighttpd-vhost-confs.py /var/www/tilltroll.robmeerman.co.uk /var/www/tilltroll.robmeerman.co.uk/lighttpd.conf /var/www/tilltroll.robmeerman.co.uk/html
where /var/www/tilltroll.robmeerman.co.uk/lighttpd.conf
contains the following:
# deny access completly to these $HTTP["url"] =~ "/\.ht" { url.access-deny = ( "" ) } $HTTP["url"] =~ "/_ht" { url.access-deny = ( "" ) } $HTTP["url"] =~ "^/(bin|data|inc|conf)/" { url.access-deny = ( "" ) }
and the script produces the following output when run:
$HTTP["host"] == "tilltroll.robmeerman.co.uk" { # deny access completly to these $HTTP["url"] =~ "/\.ht" { url.access-deny = ( "" ) } $HTTP["url"] =~ "/_ht" { url.access-deny = ( "" ) } $HTTP["url"] =~ "^/(bin|data|inc|conf)/" { url.access-deny = ( "" ) } }
It doesn't seem to be possible to do this neatly from a vhost-specific configuration file. Popular work-arounds seem to be:
php.ini
or via the webserver's config/env) to spit “umask(0002);” at the beginning of each PHP invocation. I don't like this because it only affects PHP, what if you run bespoke CGI scripts?/etc/init.d/lighttpd
so that it has “umask 0002” somewhere near the top. This is what I did.Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
”