Difference between revisions of "Institution Manager manual - Manage your EAD and EAC-CPF files"

From Archives Portal Europe Wiki
Jump to: navigation, search
(Upload via OAI-PMH harvesting)
(Harvesting your data from the Dashboard)
Line 241: Line 241:
  
 
==== Harvesting your data from the Dashboard ====
 
==== Harvesting your data from the Dashboard ====
 +
 +
 +
[[File:APE_IM_manual_28.png|600px|thumb|left|checking the base url of an OAI-PMH repository in the Dashboard]]
 +
<br clear=all>
 +
 +
 +
[[File:APE_IM_manual_29.png|600px|thumb|left|filling in harvest parameters in the OAI-PMH dialogue screen of the Dashboard]]
 +
<br clear=all>
  
 
==== Manage the harvests ====
 
==== Manage the harvests ====

Revision as of 20:59, 20 July 2018

In order to publish information on archival material gathered from different countries and institutions as consistent as possible, a common EAD profile has been defined, named apeEAD, as well as a common EAC-CPF profile, named apeEAC-CPF. All information relative to these profiles can be found in the Standards section of this Wiki and the profiles themselves can be found over here: http://www.archivesportaleurope.net/Portal/profiles/apeEAD.xsd and here: http://www.archivesportaleurope.net/Portal/profiles/apeEAC-CPF.xsd.



Unless already compliant to them, the original local files have thus to be converted to these specific schemas before being published. The portal hosts different types of EAD files: finding aids, holding guides and source guides. There is a hierarchical relation between the holdings guide and the finding aids that is materialised in the search tree of the advanced search page of the portal:

Hierarchy between the holdings guide and the finding aids


There are also relations possible between the EAC-CPF files and the finding aids published in the portal to allow the users to move from one to the other easily thanks to internal links displayed in the "Archival materials" facet of the EAC-CPF files display:

Links from an EAC-CPF file to finding aids



Prepare your data

Only XML files can be uploaded to the Archives Portal Europe. These can be database exports or copies of existing EAD/XML files. During the export, a mapping could be needed to a local XML format, to a target schema such as EAD2002 or directly to the EAD profile defined for the Archives Portal Europe (apeEAD). It is wise to collect all files in one place (e.g. one folder) which would be of help for submitting the data in one-go, f.i. when intending to use either an OAI-PMH repository or a FTP server to upload files or when wanting to upload several files combined in a zip-file via HTTP.

It is highly beneficial for the archives to think ahead of a bigger picture: the data ecosystem on the web. It is very likely that the content providers have to take care of the integrity of their own data on the internet as they increasingly tend to publish the data through various channels including their own website(s), third party websites (e.g. international, national, and regional portals, thematic portals etc.), and Linked Open Data publication.

The issue of data integrity is that the content providers continuously update their source data, implying that the data available through different channels also needs to be updated and/or give feedback to the original data source.

Some tips are given below to better take this into consideration:

  • keep track on data exports and create versioning if possible.
  • keep track on the update of the original (source) data to make sure the latest version of the data is available on the Archives Portal Europe.
  • when a big change of a (source) data system occurs in your institution, pay attention to the hyperlinks and the Persistent Identifiers (PID) in relation to the update of the data on the Archives Portal Europe
  • Archives Portal Europe may develop Web 2.0 functionality in near future where the User Generated Content (UGC) such as feedback and tagging may be included; it may be the case that the content providers also implement such functionality; so always think of the entire workflow and ecosystem of the data circulation and distribution.

Create profiles to automate the processing of the data

In order to facilitate the work of the Institution Managers, the Dashboard provides the possibility to establish specific profiles. These profiles, mandatory when using the OAI-PMH functionality, allow to automatically process the data once uploaded in the portal. It is particularly useful in the case of regular updates and additions of content in the portal. However, it is recommended to first test the portal data processing functionality manually in order to better see the different possibilities and check what is best for your data.

The profiles are used in the Dashboard to indicate which actions are to be applied to the uploaded files. You can create as many profiles as needed, for instance you could apply different rules for files without images, and files containing links to images. When using the OAI-PMH harvesting, using a profile is mandatory. Please note that you can create a "manual" profile that will allow you to process the data yourself, step after step, after harvesting/uploading.

When you create a profile, you have to give the profile a name, and precise the type of file (finding aid, holdings guide, source guide or EAC-CPF records) it is associated to and the forms will be adapted accordingly. Then you can indicate your choices in two different tabs: preferences for the Archives Portal Europe (tab Basic preferences displayed by default) and preferences for Europeana (visible as second tab called Europeana preferences).

Preferences for EAD and EAC-CPF files to be published in the Archives Portal Europe

basic preferences tab for publishing content in Archives Portal Europe


The basic preferences indicate the default actions to apply to your files:

  • publish, convert or validate the files, or nothing. Of course, if you choose "publish", the files will also be converted and validated,
  • overwrite or keep the existing file if duplicate,
  • discard the file or add the <eadid/> manually if missing,
  • specify the type of the <dao/> elements; the type displays a corresponding icon to indicate to the user whether the digitised document is a text, an image, a sound, etc.; this indication also serves for Europeana.

Specific options can be provided for the files, regarding the rights and the <dao/>-type. Note that this can be also added afterwards manually, file per file, from the content manager, when clicking on options in the column converted.

Overview of the possibilities of the basic preferences tab

  • Default action for uploaded files:
    • Publish to the Archives Portal Europe (default value)
    • Publish to the Archives Portal Europe and Europeana (in this case filling in the next tab is mandatory)
    • Convert to APE format
    • Validate against APE format
    • Nothing (use content manager for actions)
  • Default action for already existing files:
    • Overwrite existing file with new file (default value)
    • Keep existing file, discard uploaded file
    • Keep existing file, ask for identifier in case of duplicates
  • Default action for files without <eadid> element:
    • Remove uploaded file (default value)
    • Specify value for <eadid> manually
  • Default type for <dao> items:
    • Unspecified (default value)
    • TEXT
    • IMAGE
    • SOUND
    • VIDEO
    • 3D

Note: in case you have enabled Take from file (<dao@xlink:role>) if existing, then the choices you make here will only be applied to files in case these don't have values specified yet; in other words: original values from the original files will not be overwritten, but transferred.

  • Default XSL for conversion:
    • DEFAULT

Note: the default choice here is the standard general local EAD to apeEAD conversion stylesheet, which is fine in 95% of the cases; however, it is possible that your local EAD files need some extra fine-tuning, which is not available in the standard stylesheet, in that case the Archives Portal Europe's technical team can provide a specific tweaked stylesheet for your institution and make that available here for you.

  • Default rights statement for digital objects:
    • --- (= none, default value)
    • Public Domain Mark
    • Creative Commons CC0 Public Domain Dedication
    • Creative Commons Attribution
    • Creative Commons Attribution, ShareAlike
    • Creative Commons Attribution, No Derivatives
    • Creative Commons Attribution, Non-Commercial
    • Creative Commons Attribution, Non-Commercial, ShareAlike
    • Creative Commons Attribution, Non-Commercial, No Derivatives
    • Copyright Not Evaluated
    • In Copyright
    • In Copyright EU Orphan Work
    • In Copyright Educational Use Permitted
    • No Copyright Non-Commercial Use Only
    • No Copyright Other Known Legal Restrictions

Note: these rights are taken over from the current copyright frameworks: Creative Commons and Rights Statements, which are also use by Europeana.

  • Default rights statement for EAD data:
    • idem as above for digital objects

Specific preferences for forwarding content to Europeana

preferences tab for forwarding content from Archives Portal Europe to Europeana


For Europeana, the EAD files have to be converted to another format (EDM), totally different from the EAD format, and then they will be published in another portal that has different re-use rules than the Archives Portal Europe. The preferences to indicate are therefore numerous and subdivided into general and specific settings.

As converting an EAD file to the EDM standard means "flattening" the description of the document, the main options are related to the information that you want to report from the high levels of description into each EDM record. Note also that you can choose between two types of conversion: the minimal and the full one. The minimal will only take some basic elements such as the unittitle.

Please note that you will be allowed to forward content to Europeana only if you have signed the Europeana Data Exchange Agreement (DEA).


Upload EAD and EAC-CPF files

The Dashboard allows three different protocols to upload the files: HTTP, FTP and OAI-PMH. HTPP and FTP are available by choosing Upload content in the main menu, OAI-PMH by choosing Create automated harvesting function.

A short overview of the pros and cons of each method:

HTTP FTP OAI-PMH
Pros No local installation needed, data delivery can be done from a local machine Data delivery can be managed/done remotely via a dedicated FTP server (without having data on a local machine) Data delivery can be fully automated via a dedicated OAI-PMH server
Great familiarity with the technology Data can be synchronised with local database system
Data can be offered to other service providers
De-facto standard for data exchange within cultural heritage sector
Cons Data delivery is always manual Server has to be deployed locally Server has to be deployed locally
Data delivery is always manual Set-up is not always simple

Upload via the HTTP protocol

Via the Dashboard option Upload content you access the dialogue screen for HTTP or FTP upload:

using HTTP or FTP to upload files


The default value is the HTTP protocol, which enables you to select one XML file or a zip-file containing more than one XML file. Before actually uploading the file(s), you can choose a profile to let the system apply specific actions after the uploading.

Your file has to be a valid XML file or a zip file containing valid XML files. If the files are not valid, the Dashboard will reject them with a notification. The size limit of the file (one XML file or one zip-file) is 200 MB.

In case you don't select a profile the Dashboard detects the type and status of your files immediately and asks you to make some choices. In case you have selected a profile, the - predefined - choices will be taken care of immediately and you can proceed to the Content manager screen to check the results of the uploading and processing.

These are the checks that the Dashboard can make and the errors it can detect:

  • detection/notification of valid and non-valid files; click on the You can continue to content manager button to continue to the next step:
detection/notification of valid and non valid files


Note: at this stage 'non-valid' files means files that are not recognised as XML file and therefore discarded

  • detection/notification of the type of the valid files (Finding Aid, Holdings Guide, Source Guide, EAC-CPF record) that can be stored and processed; you can change to type of document via the dropdown list if necessary and then click on the Accept button to continue to the next step:
detection/notification of the type of the valid files


  • detection of files that can be processed without a problem (Successful files) and files that have a problem with apeEAD schema validation (files with errors) and therefore have to be discarded (when you click on the link Click for more information you will get a more specific error message pointing to the exact location in the XML file where the problem occurs, which enables you to correct the file and upload it again later on):
detection/notification of successful files and files with errors


Note: the number of files you can check on your screen like this is 500, so if you upload a zip-file containing more than 500 XML files, you will be offered more than one page to accept.

  • detection of files that are already stored in your Archives Portal Europe account, with the possibility - per file - to either overwrite them or discard them (the options for the dropdown list behind each file are: overwrite and cancel):
detection/notification of files repeated and files with empty ID


  • detection of files that have an empty <eadid/> element (files with empty ID), so lack an identifier, with the possibility to provide that after a check whether it is not already in use by another already stored file:
providing missing ID for files with empty ID


Note: in this case it's important to add the identifier to the original (source) file too, otherwise this problem will occur again next time this file will be offered for upload/processing

Note: all these manual actions can be avoided by using a profile in which you have predefined all these actions; so manual processing of your files is a bit tedious, but when starting to contribute content to the Archives Portal Europe it's very useful to do this once, just to check the quality of your data.

Upload via the FTP protocol

For uploading data via the FTP prootcol, the process is similar to the HTTP. When choosing the FPT protocol in the upload content menu, you have to fill in the address of your FTP server, give the username and the password, and connect to the FTP server. The profile to apply can be selected afterwards, when you select the files to be uploaded.

uploading data via the FTP protocol, connecting to an FTP server


Upload via OAI-PMH harvesting

The use of the OAI-PMH protocol is highly recommended. For more information, please refer to the OAI-PMH website. You can also read the Best practice for OAI PMH Data Provider Implementations and Shareable Metadata (a bit old, but the bases of the OAI-PMH did not change).

Once set up, everything can be automated, from the harvest to the publication of data in the Portal and delivery to Europeana.

General recommendations

There are many open source OAI-PMH tools (for more details see: http://www.openarchives.org/pmh/tools/tools.php). When implementing an OAI-PMH repository, it is recommended to test it before submitting data to the Archives Portal Europe. There are several (online) testing tools that can also be used for this purpose (e.g. OAI repository Explorer, see: http://re.cs.uct.ac.za and OAI-PMH Validator & Data extractor Tool, see: http://validator.oaipmh.com).

checking the quality of an OAI-PMH repository via: http://validator.oaipmh.com


Some important point have to be checked to ensure a correct harvesting process and take advantage of all its possibilities:

  • All verbs and arguments must be implemented in your repository (see the schema below).
general schema of the OAI-PMH protocol syntax


  • The repository must manage the deleted records (value set to "persistent"), in order to allow differential harvest. The differential harvest is indeed the major advantage of using the OAI-PMH protocol. After the first full harvest of your data, you only harvest the information related to the new, updated or deleted files, which makes the harvesting process generally faster. If the value is set to "no" or even "transient", you have to re-harvest everything (and process all the files again afterwards) including the files that did not change. This has as a consequence that your server and the Archives Portal Europe server have to perform a lot of redundant actions, which influences the bandwith of both your and our server negatively.
  • It is highly recommended to organise your data in sets (even if you can only provide one set for all your files) and if needed sub-sets, in order to better manage the harvests and to avoid too big chunks that might be harder to handle by the servers and take too long to be finished (up to several days). The sets can be based on your own file plan or whatever file organisation you have in your institution.
  • If possible, identifiers have to be unique and persistent URI, therefore they will not change over time and the links to your own website will not be broken.

Harvesting your data from the Dashboard

checking the base url of an OAI-PMH repository in the Dashboard



filling in harvest parameters in the OAI-PMH dialogue screen of the Dashboard


Manage the harvests

the OAI Harvester tool

Manage your EAD Files

Download your files