This is... OAICopy version 0.5.5 2005-06-27 by Aaron Krowne (akrowne@emory.edu) Synopsis ======== This program lets you copy an Open Archive repository [1] to a local, static archive with a single command. The command, at its simplest, has only two parameters: 1. The base URL of the remote OAI provider. 2. The path to a directory on the local system where you want to set up a new provider (this path should end with either an empty or nonexistant dir). The locally-created archive is "static" because it is based on a one-time snapshot of the data, which is stored in individual record XML files, which the OAI-XMLFile system understands. This static repository does NOT conform to the "official" OAI static repository spec [2], because sets are supported and the provider is in all ways fully-featured and fully functional. Of course, you can edit the data in the XML files if you want, but the key point is that there is no database involved, and the records are not assembled, transformed, or generated upon request. What's the point of all this? Aside from being useful to mirror OAI repositories, this program anticipates a day when web services will abound which transform entire collections, producing new collections as output, represented by ad hoc static OAI repositories. In fact, we are working towards this on the MetaCombine [3] and OCKHAM [4] projects. For example, we are building a clustering web service that takes a (flat) OAI repository as input and produces an ad hoc OAI repository as output, which contains a set structure corresponding to a novel organization scheme. Similarly, we are building a classification web service that can train based on a set structure and records in an OAI repository, and classify correspondingly un-labelled/organized records from another repository, producing an ad hoc output repository which organizes *all* of the records into sets. Having these web services output static, ad hoc repositories makes the output instantly browsable, usable, and comprehensible. But, we do not expect that web service providers will make these ad hoc repositories available indefinitely. This is where the oaicopy command comes in: it lets you "grab" these results before they go away, and build local digital library services based on them. Further, once the results are grabbed, one can send them through another web service that takes an OAI collection as input, enhancing the records even more, and once again capturing the new output collection. The upshot is that entire collections are abstracted into fungible, portable objects, represented as Open Archives, and addressed by OAI provider base URLs. We expect to eventually build a "piping" system that transparently manages intermediate steps of chained web service operations on collections, using this oaicopy program as a back-end "glue" tool. References: [1] The Open Archives Initiative, < http://www.openarchives.org/ > [2] Van de Sompel, et al., Specification for an OAI Static Repository and an OAI Static Repository Gateway < http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm > [3] The MetaCombine project, < http://www.metacombine.org/ > [4] The OCKHAM project, < http://www.ockham.org/ > Usage ===== The command is used like: oaicopy < baseURL > < path > Let's make the following assumptions for an example: - the OAI archive you want to copy is available at the baseURL http://aux.planetmath.org/oai/provider-2.0.pl - your web root is /usr/lib/cgi-bin - your web root is accessible from the web as http://your-host.com/cgi-bin/ - you've made a /usr/lib/cgi-bin/providers Then you could use the command like: oaicopy http://aux.planetmath.org/oai/provider-2.0.pl /usr/lib/cgi-bin/providers/pm_mirror The command would create a /usr/lib/cgi-bin/providers/pm_mirror/ dir, and populate it with the data for the repository. You'd be able to access the new repository at the baseURL: http://your-host.com/cgi-bin/providers/pm_mirror/oai.pl The new archive is functionally configured with dummy values, but you can customize them by editing the config.xml, which is in the same directory as oai.pl. Basic things you might want to do are give the archive a meaningful name, nickname, and admin email address. Installation ============ -> This program and all of its dependencies are Perl-based. The following are the individual dependencies: - Net::OAI::Harvester - XML::LibXML - LWP::UserAgent - XML::SAX - for XML parsing - URI - Storable You also probably should be using Perl 5.8.0 at least since many repositories so that UTF8 data is handled properly. -> To install: Make sure you have the dependencies, then run ./install as root. The 'oaicopy' command should now work. Development Roadmap =================== 1.0 - Support calling OAI-XMLFile's configurator for archive conf. 0.9 - Command-line configuration options for archive conf. 0.8 - Copying all of metadata formats supported by an archive. 0.7 - Rewrite parser based on SAX instead of DOM. Maybe drop Net::OAI::Harvester in favor of HTTP::OAI? 0.5 - First release. Copying works for oai_dc and sets. ChangeLog ========= 0.5.5 - Handle OAI 1.1 input repositories. We still write 2.0s as output, which means you can now use oaicopy to instantly upgrade a repository. 0.5.2 - Remove trailing slash in output/dest dir specification. Minor documentation cleanups. 0.5.1 - Bug fix, "use" statement in oai.pl template file, handling of sets (manifests for set depths > 1). 0.5 - First release. Copying works for oai_dc and sets. License ======= See the "LICENSE" file. Notes ===== - The "test" collections that were included with OAI-XMLFile have been removed within this distribution. They are really useful for understanding how OAI-XMLFile works, however, so if you want to do this, get the "real" release of OAI-XMLFile off http://www.openarchives.org/. Acknowledgements ================ This work was supported in part by the Mellon Foundation (the MetaCombine project) and NSF (the OCKHAM project). Contact ======= For help or comments please contact me, Aaron Krowne, at akrowne@emory.edu. Permanent contact is Martin Halbert - mhalber@emory.edu. The current version of this package should be available at http://www.metacombine.org/software/. Good luck! -Aaron 2005-06-14 Emory University