Difference between revisions of "OAI manual Set up the harvest"
(→start the harvest) |
(→start the harvest) |
||
Line 77: | Line 77: | ||
and then immediately proceeds with fetching the files themselves, after which it will 'match' the amount of actually harvested files with the assessment earlier, providing at the same time information on how long it took to harvest the dataset: | and then immediately proceeds with fetching the files themselves, after which it will 'match' the amount of actually harvested files with the assessment earlier, providing at the same time information on how long it took to harvest the dataset: | ||
− | |||
[[File:APE_OAIHarvester_manual_13.png|600px|thumb|left|OAI Harvester manual, figure 13]] | [[File:APE_OAIHarvester_manual_13.png|600px|thumb|left|OAI Harvester manual, figure 13]] |
Revision as of 21:50, 18 July 2018
To set up the harvest, you just have to follow the instructions displayed on the screen. How does the tool function? It sends the requests to the repository by using the normal [OAI-PMH syntax] (beginning with the first request: the verb Identify) and proposes the choices between the different possibilities offered by the repository as soon as it receives the answers.
Contents
indicate the address of your repository
The first question the tool will ask you is the url of the OAI-PMH server. This url or web address must include the prefix: http or https, for example: http://www.gahetna.nl/archievenoverzicht/oai-pmh
The tool then asks you to indicate whether your are using a proxy server. In some network environments access to the internet is secured via a proxy server. If that is the case, then enter the url or web address of the proxy server (ask the administrator of your network environment about this). In case you don't use a proxy server, for example in case you use the tool at home, then you can skip answering this question by pressing the enter key.
The harvester begins its dialogue with the repository by sending the request verbs and providing the according answers: list of metadata, list of sets, etc.
select the type of metadata that you want to harvest
The tool lists the types of metadata found in the repository and asks you to select one of them:
In this example, data are provided in three different types of metadata: oai_dc (Dublin Core/XML), oai_ead (a short basic version of an EAD/XML finding aid), and oai_ead_full (the complete full version of an EAD/XML finding aid). Let's choose 3: oai_ead_full.
select the set that you want to harvest
Then the tool lists the datasets found in the repository with the chosen metadata format and gives them an arbitrary number to allow you to choose one. Please note that you can harvest only one dataset at a time, so if you want to harvest everything or more than one dataset, then you have go through this whole process per dataset.
In this example you have 4 datasets, let's choose 1: naa1, the dataset containing all Nationaal Archief's finding aids in the category 1.x.xx, meaning: finding aids of governmental archives from before the year 1795.
select the FROM and TO dates
Then the tool asks whether you want it to take a beginning and end date of the data as available into consideration, so in this case: a beginning or end date for the creation or adaption of the finding aids in the chosen dataset. This is not mandatory and only useful in case you want to make a differential harvest, so a harvest of data as produced during a certain period.
In case you don't want to make use of this functionality, then you can simply skip both options by pressing the enter key.
select the harvest method
Then the tool asks you whether you want to use the standard ListRecords harvesting method or the special ListIdentifiers/GetRecord harvesting method, which is to be preferred in case you encounter an unstable repository, because then the harvesting process will continue in case errors are encountered (fail safe).
Let's choose 1: ListIdentifiers/GetRecord (fail safe).
select the type of records that you want to save
Next you can choose to save the metadata records (so the original files) either as the full OAI response (so the original files within an OAI "wrapper"), or in their original (metadata) format.
In case you want to upload the files directly to the Archives Portal Europe, harvesting them in their original (metadata) format is to be preferred, so let's choose 1: Save only the metadata record (e.g. EAD, EDM or DC files)
start the harvest
Then the tool provides a summary of the choices and asks whether you want to proceed:
To start the harvest choose 1: yes
Then the tool starts the actual harvesting by making a quick inventory of the files that are in the requested dataset:
and then immediately proceeds with fetching the files themselves, after which it will 'match' the amount of actually harvested files with the assessment earlier, providing at the same time information on how long it took to harvest the dataset: