Since the CS department website is migrating to Drupal, we have to set up a bunch of re-directs. Why? Because people may have bookmarked pages or otherwise recorded the obsolete URLs to our old site, and we want them to get the new, up-to-date content instead. This can easily be done with a redirect. (See http://www.yolinux.com/TUTORIALS/ApacheRedirect.html for a variety of options.)
A redirect, particularly one that is read when Apache starts, has a speed advantage over most of the alternatives. No files need to be accessed, and Apache just quickly returns the new URL to the browser and the browser accesses that new URL instead.
At first, I thought that the redirect had to be with a Directory container, like this:
<Directory /home/cs/public_html/Resources/>
Redirect permanent quota.html http://new.wellesley.edu/cs/resources/quota
</Directory>
but that was an abysmal failure. Instead, you give the complete URL at the top level in your Apache configuration file, like this:
[root@tempest ~]# cd /etc/httpd/conf.d
[root@tempest conf.d]# grep linux wellesley.conf
Redirect permanent /~cs/Resources/linux.html http://new.wellesley.edu/cs/resources/linux
Note that I’m putting these entries in a separate wellesley.conf file that I put in the /etc/httpd/conf.d directory. All the .conf files in that directory are automatically loaded when Apache starts up, so that makes a nice way to keep our customizations separate from the basic /etc/httpd/conf/httpd.conf file.
There are a few things that can’t be overridden in a separate .conf file, and have to be done by editing the httpd.conf file. These are noted in the file in the usual way, with the “Wellesley mod” comment:
[root@tempest conf.d]# cd ../conf
[root@tempest conf]# ls
httpd.conf httpd.conf.orig magic
[root@tempest conf]# diff httpd.conf httpd.conf.orig
366,368c366
< # Wellesley mod: allow UserDir
< #UserDir disabled
< UserDir public_html
---
> UserDir disabled
I’m going to do the same thing (a separate wellesley.conf) file on Puma:
[root@puma ~] cd /etc/httpd/conf.d/
[root@puma conf.d] cp /home/sysadmin/etc/httpd/conf.d/wellesley.conf .
[root@puma conf.d] ls -l w*.conf
-rw-r--r-- 1 root root 352 Jul 12 2006 webalizer.conf
-rw-r--r-- 1 root root 299 Jun 6 10:04 welcome.conf
-rw-rw---- 1 root root 25625 Aug 23 15:07 wellesley.conf
[root@puma conf.d] chmod 644 wellesley.conf
[root@puma conf.d] ls -lZ w*.conf
-rw-r--r-- root root system_u:object_r:httpd_config_t:s0 webalizer.conf
-rw-r--r-- root root system_u:object_r:httpd_config_t:s0 welcome.conf
-rw-r--r-- root root user_u:object_r:httpd_config_t:s0 wellesley.conf
[root@puma conf.d] chcon --reference=welcome.conf wellesley.conf
[root@puma conf.d] ls -lZ w*.conf
-rw-r--r-- root root system_u:object_r:httpd_config_t:s0 webalizer.conf
-rw-r--r-- root root system_u:object_r:httpd_config_t:s0 welcome.conf
-rw-r--r-- root root system_u:object_r:httpd_config_t:s0 wellesley.conf
[root@puma conf.d] apachectl -t
[Thu Aug 23 15:08:32 2012] [warn] module unique_id_module is already loaded, skipping
[Thu Aug 23 15:08:32 2012] [warn] The Alias directive in /etc/httpd/conf/httpd.conf at line 1257 will probably never match because it overlaps an earlier Alias.
[Thu Aug 23 15:08:32 2012] [warn] The Alias directive in /etc/httpd/conf/httpd.conf at line 1310 will probably never match because it overlaps an earlier Alias.
[Thu Aug 23 15:08:32 2012] [warn] The Alias directive in /etc/httpd/conf/httpd.conf at line 1311 will probably never match because it overlaps an earlier Alias.
[Thu Aug 23 15:08:32 2012] [warn] The Alias directive in /etc/httpd/conf/httpd.conf at line 1312 will probably never match because it overlaps an earlier Alias.
Syntax OK
[root@puma conf.d] apachectl graceful
[root@puma conf.d]
The “apachectl -t” command checks the Apache configuration files for syntax errors. The “apachectl graceful” command restarts Apache, gracefully, basically waiting until the current requests are satisfied before restarting, so no browser gets an “interruption” response.
We should fix those Alias errors. However, let’s first determine whether they’re caused by the overlap between the wellesley.conf file and the stuff I haven’t yet taken out of httpd.conf:
[root@puma conf.d] mv wellesley.conf wellesley.conf-disabled
[root@puma conf.d] apachectl -t
Syntax OK
Yup. Ok, so let’s remove all the redundant stuff from httpd.conf and try again. Most of the edits were pretty straightforward, but I removed the ExecCGI on /var/www/html; I’m not sure why it was ever there. I also restored the entire directory configuration for /var/www/cgi-bin, trying to minimize unnecessary differences with the .orig file. I also removed the VirtualHost configuration. We can always restore these from /etc/httpd/conf/httpd.conf.with_wellesley_mods
[root@puma conf.d] cd ../conf
[root@puma conf] diff httpd.conf httpd.conf.orig
266d265
< ServerName cs.wellesley.edu:80
356,357c355
< # Wellesley mod: don't disable
< #UserDir disable
---
> UserDir disable
364,365c362
< #Wellesley: allow this(sda and sjk)
< UserDir public_html
---
> #UserDir public_html
580d576
<
[root@puma conf]
Let’s check the selinux context, even though we’re not using selinux on Puma:
[root@puma conf] ls -lZ httpd.conf httpd.conf.orig
-rw-r--r-- root root user_u:object_r:httpd_config_t:s0 httpd.conf
-rw-r--r-- root root system_u:object_r:httpd_config_t:s0 httpd.conf.orig
[root@puma conf] chcon --reference=httpd.conf.orig httpd.conf
[root@puma conf] ls -lZ httpd.conf httpd.conf.orig
-rw-r--r-- root root system_u:object_r:httpd_config_t:s0 httpd.conf
-rw-r--r-- root root system_u:object_r:httpd_config_t:s0 httpd.conf.orig
[root@puma conf]
Okay, let’s recheck the syntax:
[root@puma conf] apachectl -t
Syntax OK
[root@puma conf] cd ../conf.d
[root@puma conf.d] mv wellesley.conf-disabled wellesley.conf
[root@puma conf.d] apachectl -t
Syntax OK
[root@puma conf.d] apachectl graceful
Great. Now, the redirects should work for Puma:
[root@puma conf.d] grep Redirect wellesley.conf | head
# Redirects to Drupal site.
Redirect permanent /index.html http://new.wellesley.edu/cs
# Use RedirectMatch with trailing $ so that /~cs/foo doesn't map to new.wellesley.edu/csfoo
RedirectMatch permanent /$ http://new.wellesley.edu/cs
Redirect permanent /index.html http://new.wellesley.edu/cs
Redirect permanent /~cs/Curriculum/curriculum.html http://new.wellesley.edu/cs/curriculum
Redirect permanent /~cs/Curriculum/majorminor.html http://new.wellesley.edu/cs/curriculum/major
Redirect permanent /~cs/Curriculum/intro.html http://new.wellesley.edu/cs/curriculum/introductory
Redirect permanent /~cs/Curriculum/mit.html http://new.wellesley.edu/cs/curriculum/mit
Redirect permanent /~cs/Curriculum/olin.html http://new.wellesley.edu/cs/curriculum/olin
[root@puma conf.d]
Here is a nice way to test that the redirects work:
[root@puma conf.d] GET -Sd http://tempest.wellesley.edu/
GET http://tempest.wellesley.edu/ --> 301 Moved Permanently
GET http://new.wellesley.edu/cs --> 200 OK
[root@puma conf.d] GET -Sd http://tempest.wellesley.edu/~cs/Curriculum/curriculum.html
GET http://tempest.wellesley.edu/~cs/Curriculum/curriculum.html --> 301 Moved Permanently
GET http://new.wellesley.edu/cs/curriculum --> 200 OK
[root@puma conf.d] GET -Sd http://puma.wellesley.edu/~cs/Curriculum/curriculum.html
GET http://puma.wellesley.edu/~cs/Curriculum/curriculum.html --> 301 Moved Permanently
GET http://new.wellesley.edu/cs/curriculum --> 200 OK
[root@puma conf.d]
This shows the sequence of URLs returned from the server and re-fetched by the browser (in this case, the GET command is substituting for the browser).
Note that, one case, we used the more powerful RedirectMatch directive, since that allowed us to put the trailing $ to avoid overmatching. These redirects match all URLs that start with the path shown, so:
[root@puma conf.d] GET -Sd http://tempest.wellesley.edu/~cs/index.html
GET http://tempest.wellesley.edu/~cs/index.html --> 301 Moved Permanently
GET http://new.wellesley.edu/cs --> 200 OK
[root@puma conf.d] GET -Sd http://tempest.wellesley.edu/~cs/index.htmlfoo
GET http://tempest.wellesley.edu/~cs/index.htmlfoo --> 404 Not Found
[root@puma conf.d] GET -Sd http://tempest.wellesley.edu/~cs/index.html/foo
GET http://tempest.wellesley.edu/~cs/index.html/foo --> 301 Moved Permanently
GET http://new.wellesley.edu/cs/foo --> 404 Not Found
[root@puma conf.d]
Did you see how the “foo” got tacked onto the end of the re-written URL? That won’t typically be a problem for normal filename URLs, but for a directory URL like ~cs/, we can’t just tack the rest of the URL onto the end of the re-write. So, we only rewrite if the pattern matches, and it’ll only match if the URL matches exactly. That’s how we can still get to stuff served out of ~cs/public_html.
Now, we should do some cleaning up of public_html, since most of it is now old. I created the following. It’s a new public_html, with a handful of the files we still need, plus a pointer to the old stuff (should probably be made read-only) and a README file:
[root@puma conf.d] pushd ~cs
/home/cs /etc/httpd/conf.d
[root@puma cs] ls -ld pub*
lrwxrwxrwx 1 cs faculty 22 Aug 22 12:29 public_html -> archive/cs_07-08/cs_v3
drwxrwxr-x 4 cs faculty 4096 Aug 23 16:45 public_html.new
lrwxrwxrwx 1 root root 28 Nov 12 2006 public_html.old -> archive/cs_02-03/csweb-2002/
[root@puma cs] ls -lR public_html.new/
public_html.new/:
total 48
drwxrwxr-x 2 cs faculty 4096 Aug 23 13:41 Curriculum
-rw-rw-r-- 1 cs faculty 31065 Aug 23 16:44 facultyInfo.html
-rw-rw-r-- 1 cs faculty 858 Aug 23 16:44 facultyInfo-style.css
lrwxrwxrwx 1 cs faculty 26 Aug 23 13:26 old -> ../archive/cs_07-08/cs_v3/
drwxrwxr-x 3 cs faculty 4096 Aug 23 13:49 People
-rw-rw-r-- 1 cs faculty 698 Aug 23 16:45 README
public_html.new/Curriculum:
total 148
-rw-r--r-- 1 cs faculty 9712 Aug 23 13:33 course-graph-f08.js
-rw-r--r-- 1 cs faculty 12510 Aug 23 13:33 course-graph-f12.js
-rw-r--r-- 1 cs faculty 16727 Aug 23 13:33 dependencies-f08.html
-rw-r--r-- 1 cs faculty 40803 Aug 23 13:39 dependencies-f12.html
-rw-r--r-- 1 cs faculty 40672 Aug 23 13:33 dependencies-f12.html~
lrwxrwxrwx 1 cs faculty 21 Aug 23 13:35 dependencies.html -> dependencies-f12.html
-rw-rw-r-- 1 cs faculty 8889 Aug 23 13:33 graph.js
-rw-rw-r-- 1 cs faculty 3124 Aug 23 13:33 tooltip.js
-rw-rw-r-- 1 cs faculty 1312 Aug 23 13:33 utils.js
public_html.new/People:
total 4
drwxrwxr-x 2 cs faculty 4096 Aug 23 13:49 wellesley-only
public_html.new/People/wellesley-only:
total 8
-rw-rw-r-- 1 cs faculty 5840 Aug 23 13:49 students.html
[root@puma cs]
Now, all we have to do to switch to this is the following:
[root@puma cs] rm public_html ; mv public_html.new public_html
All the redirects will now work, and we can get to the old content:
[root@puma cs] GET -Sd http://puma.wellesley.edu/~cs/index.html
GET http://puma.wellesley.edu/~cs/index.html --> 301 Moved Permanently
GET http://new.wellesley.edu/cs --> 200 OK
[root@puma cs] GET -Sd http://puma.wellesley.edu/~cs/Curriculum/curriculum.html
GET http://puma.wellesley.edu/~cs/Curriculum/curriculum.html --> 301 Moved Permanently
GET http://new.wellesley.edu/cs/curriculum --> 200 OK
[root@puma cs] GET -Sd http://puma.wellesley.edu/~cs/old/index.html
GET http://puma.wellesley.edu/~cs/old/index.html --> 200 OK
[root@puma cs]
Okay, I think we’re good to go. I’m sure there will be some small errors, things that should be moved over from “old” or migrated to Drupal, but the bulk is in shape, I think.