Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.adass.org/adass/proceedings/adass96/reprints/fitzpatrickm.pdf
Дата изменения: Wed Jan 14 23:24:17 1998
Дата индексирования: Tue Oct 2 16:37:06 2012
Кодировка:

Поисковые слова: arp 220
Astronomical Data Analysis Software and Systems VI ASP Conference Series, Vol. 125, 1997 Gareth Hunt and H. E. Payne, eds.

Automatic Mirroring of the IRAF FTP and WWW Archives
Mike Fitzpatrick and Doug Tody IRAF Group,1 NOAO,2 PO Box 26732, Tucson, AZ 85726 David L. Terrett Rutherford Appleton Laboratory, Chilton, Didcot, Oxfordshire OX11 0QX, United Kingdom Abstract. Large FTP archives have long used mirrors (copies of the network archive maintained on remote hosts) to decrease the load on a particular server or shorten the network path to provide faster download times. Little has b een done however to simplify mirroring of WWW (World Wide Web) pages, although many pro jects and users now rely on Web pages at least as heavily as anonymous FTP services. With the dramatically increasing use of the global Internet in the past year, the network has b ecome overloaded, and network access, esp ecially overseas, is often very slow during p eak hours. We present a strategy based on hostindep endent URLs which allows Web pages to b e automatically mirrored to b oth remote Web hosts and CD-ROMs. Issues affecting a site wishing to mirror a remote archive are discussed.

1.

Introduction

The sub ject of automatic mirroring can b e approached in one of two ways: from the standp oint of those wishing to exp ort their archive for mirroring, and of those wishing to b e a mirror for an existing archive. Although this pap er deals with the sp ecific issues we faced in setting up a mirror of the IRAF archives, the techniques presented are general, and can easily b e applied to any similar archive. On b oth ends there were some exp ected setup glitches in trying to verify the thousands of links involved, in bringing b oth systems to a common understanding ab out requirements in HTTP server and local software, and in establishing a routine procedure for maintaining the mirror. The initial exp eriment b etween the NOAO IRAF Group and the UK Mirror at Rutherford Appleton Labs has worked out many of these problems, and has provided us with the ability to establish other mirrors much more easily. In the first five months of op eration, the
1

Image Reduction and Analysis Facility, distributed by the National Optical Astronomy Observatories National Optical Astronomy Observatories, operated by the Asso ciation of Universities for Research in Astronomy, Inc. (AURA) under co operative agreement with the National Science Foundation

2

310

© Copyright 1997 Astronomical Society of the Pacific. All rights reserved.


Automatic Mirroring of the IRAF FTP and WWW Archives

311

UK IRAF Mirror Site has distributed more than 4300 files to 120 different nodes in more that ten countries, providing a faster, more reliable link for UK and Europ ean sites. Negotiations are underway to establish mirrors in other parts of the world where FTP access to the NOAO Tucson archives or UK mirror sites is prohibitively slow. The host-indep endent manner in which the WWW pages are written means that they can also b e used from a CD-ROM running on a local machine, in effect duplicating the IRAF archive on any machine. We discuss the limitations and sp ecial setup required in this case. 2. Preparing Your Archive for Mirroring

There are only a few steps involved in preparing your archive so it can b e easily mirrored elsewhere: 2.1. Host-Indep endent URLs

The mirroring site will have a Web address different from the original site. If Web pages contain explicit HTTP URLs, then these pages will still refer to the original archive when the pages are mirrored, negating the p oint of the Web mirror. The simplest solution is to substitute file relative URLs in all cases except where one really does want a URL to p oint to a sp ecific network host. For the exp orting site this means each link will need to b e examined and changed in the following ways: · Use "file.html" or "sub dir/file.html" link references. Keep it simple, no complex relative paths. · Since the Web root directory may b e different on the mirror node (which is likely serving its own documents), root-relative links such as "/irafhomepage.html" should b e avoided. · We don't want to require that the mirroring site put documents in a particular directory, so the b est compromise is to establish a set of common links for b oth systems so root-relative paths can b e used on either host correctly. In our case we established /iraf/ftp and /iraf/web links p ointing to the root of the FTP and WWW areas resp ectively (this also fits in well with the suggested directory structure for an IRAF installation). It so happ ens that our Web pages are under the FTP directory tree, but this is not a requirement. 2.2. CGI Scripting

There are several things to b e done to make most CGI scripts p ortable: · One cannot assume that a mirror node will have all of the custom local software that may b e used by CGI scripts, or indeed that it is even the same typ e of machine. For our Web archive we've created a bin, lib, and src sub directory containing binaries and source for all programs (mail filters, search engines, etc.) used by the various scripts. All binaries are built from these sources, meaning that versions are current for all platforms and are automatically up dated in the mirror site when a new version is installed by the originating site.


312

Fitzpatrick, Tody, and Terrett · In HTML forms or links, references will b e made to a particular script or application. Since a mirror may b e running on a typ e of computer different from the original server, these task names are actually csh scripts which call the binary (in the bin sub directory) appropriate for that platform. In our case, the scripts reference a program called mget as a mail filter, the mget script figures out what typ e of machine it's running on and passes the arguments to a mget.sparc binary to do the actual work. · Path names in scripts CGI scripts are often written as scripts of some typ e (csh, Perl, etc.) which are invoked using an, e.g., #!/bin/csh path as the first line. Such paths may not b e universal, however. The mirroring site is resp onsible for creating the system links needed to resolve these paths.

2.3.

The Final Steps

You may wish to arrange for mirror site usage logs to b e propagated back to the original site. This can b e done as a weekly cron job that greps for entries containing a certain keyword in the logs ("iraf " in our case) and automatically mails them to a sp ecific maildrop. If the archive is large, it is b est to make a snapshot tap e of the full directory tree tree to b e mirrored and mail that to the mirroring institution to p opulate the initial directory tree. Once the initial system is installed and working, up dates should b e small and will b e handled automatically by the mirror software. 3. Setting up a Mirror Archive

Now that the initial IRAF mirror site has b een established, we should have worked out most of the bugs in the scripts and documents on our end, but there are still concerns for new sites wishing to establish a mirror: 3.1. Disk Space Required

The complete IRAF archive now requires approximately 3 GB of storage--this will probably increase another 1 GB in the next year as more software is released. Potential mirrors should consider the purchase of a new dedicated disk. 3.2. Mirroring Package Used

The RAL mirror site is maintained using a package called MIRROR from Lee McLoughlin of the University of London; other packages are also available. This particular package required Perl 4, which had to b e installed sp ecifically to supp ort the package. A cron script is run nightly to up date the archive, and a separate script is run once weekly to mail access logs back to Tucson. The archive scripts directory is mirrored separately to a different directory, in part b ecause execute p ermissions are stripp ed in the mirroring process and in part so new code may b e hand checked, as a security measure. 3.3. HTTP Server Requirements serving Web documents and had a configured HTTP using the CD-ROM, may need to configure a server. to supp ort the mirror were alias definitions for the y. This means editing the http d/conf/srm.conf file

The UK mirror was already server. New sites, or those The only changes required IRAF CGI scripts director


Automatic Mirroring of the IRAF FTP and WWW Archives

313

with an Alias and ScriptAlias definition for the scripts directory which p oints to the iraf Web scripts directory on the mirror, and aliases for the root-relative links. For example, Alias Alias Alias ScriptAlias /iraf/web /iraf/ftp /scripts /scripts/ /iraf/web /iraf/ftp /mirror/iraf/web/scripts/ /iraf/web/scripts/ define a default MIME typ e cannot determine the typ e files, compressed PostScript rather than b eing identified around this, we suggest the

One other problem is that most HTTP servers as plain text for documents for which the server from the file name extension. This means that tar files, etc., show up as jumbled text in the browser as binary or starting an external viewer. To work following definition in the server's srm.conf file Redirect /iraf/ftp/

ftp://iraf.noao.edu/

This causes the most browsers to create a save p op-up window rather than trying to display the file, which is what is most often desired. Aside from the initial setup and verification of new scripts, the process is now largely automatic requiring an estimated one hour/month to maintain the mirror. Only rarely has the nightly up date not completed successfully; in each case it has succeeded the following night. 4. CD-ROM Issues

While the host-indep endent nature of the WWW pages means the archive can b e distributed on and browsed from a CD-ROM, there are a few issues of concern for viewing the CD-ROM Web pages as though they were a live Web site: · An HTTP server must b e running in order for CGI scripting to work, otherwise all documents are accessed with a file URL and scripts will not execute. · Sites using the CD-ROM as a local archive must also rememb er to set the /iraf/ftp and /iraf/web links discussed, so links are resolved. · At present, the archive only contains binaries used by CGI scripts for those mirror platforms we know ab out. More binaries are needed. 5. Pro ject Status

We welcome inquiries from any sites wishing to set up additional IRAF mirrors, or from sites interested in using the techniques outlined in this pap er to mirror their own archives. Contact iraf@noao.edu for further information.