Scooping your Drupal Site with HTTrack

You can make a static archive of your Drupal site using a program called HTTrack, which you can download for free here. This program works with Windows 95 through Win XP. (There are comparable site downloading programs for Macintosh and Linux.) Download "WinHTTrack: Windows 95/98/NT/2K/XP (also included: command line version)." After you download the installation file to your desktop, click on the .exe file downloaded to begin the installation process.

  1. Open HTTrack.
  2. In the "New project name" box (right pane), give your project a name. This name will be the folder where HTTrack will store your archived site.
  3. In the left pane, navigate to the location where you want to store the downloaded files. In the Base Path box (right pane), you will then see the path you have selected. You don't have to create a new folder since HTTrack will give your archive a name automatically, using a folder structure that mirrors what's used for your website.
  4. Cut-and-paste the URL of your course home page into the Web Addresses (URL) box. (The Add URL button is used for sites that require a log-in, but your site has that feature turned off now.)
  5. You don't need to change anything with the "set options" button.
  6. Click on "Next >"
  7. If you are already online, you can click Finish on this next screen. However, you can adjust your connection settings (to tell HTTTrack to access the Internet by some means if you aren't already online). Most people will already be online when starting this process, so you can launch the archiving process now. You can also save these settings if you'd like to archive your site later by checking the "Save settings only" button.
  8. Click on Finish
  9. HTTTrack will show you its progress. It is downloading all pages in your site and storing them on your local computer in the folder you defined in the Base Path box (see Step 3). The process goes very fast (about five minutes with on a cable modem connection for a whole semester's Drupal site), but it may take a few minutes to scoop all the pages in a large site. HTTTrack checks all links, fixes internal links so that the site will work when uploaded elsewhere, and more.
  10. When the process is finished, you can view an "error log" or "browse mirrored website." The process will inevitably lead to some errors that you probably don't need to worry about. You should browse the archived (mirrored) website, however. Clicking on that button opens the homepage in your browser, exactly as it will appear when uploaded to a server (such as your career account). It's really quite amazing what you'll see: a website that looks almost exactly like the original but without the interactive features.
  11. Your website will be stored in the folder you named in Step 3. For example, we used HTTrack to archive an English 420 site from Summer 2005 taught in Florence. HTTrack puts the files in folder that mirrors the original URL. We named our Base Path (Step 3) as "420 in Florence," so here's how the website was stored on the local hard drive:

    C:\Websites\Drupal Sites\420 in Florence\joe.english.purdue.edu\sp05\blakesley2