Multilingual search

From Archives Portal Europe Wiki
Revision as of 22:45, 24 September 2016 by Admin (Talk | contribs)

Jump to: navigation, search

One of the advantages of the Archives Portal Europe is that you can browse through the content of a lot of European archival institutions, so through the archival material of a lot of European countries and of course in a lot of European languages. But when you search for material on a specific topic of interest, using a single search term in one language, you will only get search results of a small part of all the information on your topic that the portal contains. So searching for "swan" (English) will only provide search results for "swan", but not for "cygne" (French), "Schwan" (German) or "zwaan" (Dutch). So how is it possible to search in data in more than one language?

The Archives Portal Europe supports multilingual searching in a few ways: by offering suggestions for other search terms via "auto completion" and "auto suggestion", by facilitating creating smart search queries using "wildcards" and "Boolean operators" and by clustering data on "themes" or "topics" via the tag cloud.



Auto completion

When you start typing a search term in the search box, then the "auto completion"-functionality offers you a list of suggestions for search terms, including the corresponding number of search results these would lead to, while you are typing. For example: if you start typing the search term "neuropsychology", the portal tells you that there are search results available for terms with a slightly different spelling, among which the Dutch "neuropsychologie".

Auto completion example


While typing your search term you can stop typing and decide to go for one of the alternative search terms from the list by clicking on it with your mouse. Of course you can always ignore this list, finish typing your own search term and activate that search by clicking on the "search"-button.

Auto suggestion

Once you have activated a search and your search results are shown, the portal again offers you some alternative terms for the search term chosen. These alternative terms, offered via the "auto suggestion"-functionality, are shown right above the list with search results and they also include the amount of search results they will lead to. Among these suggested terms you can also find alternatives for your original search term in other languages.

Auto suggestion example


Wildcards

The functionalities described above offer suggestions for alternative search terms and can lead to information on your topic of interest in other languages, but then you are still searching with only one search term and you have to run more queries to make use of the suggestions offered. The Archives Portal Europe also offers the possibility to search on more search terms in one query and one of them is to use wildcards in your search term. The idea is to replace one or more characters in you search term with a wildcard in order to capture search results on different spellings of the search term.

The most important wildcards are the star (*) and the question mark (?). You can only use wildcards in your search terms if they are preceded by at least two characters.

By using the star in a search term, you tell the system that the search term has to be whatever characters you have typed plus any amount and any type of characters that possibly can follow. For example: if you enter a search term like this: "neurop*", then you will get results for all search terms starting with "neurop"

Wild card example 1


By using the question mark in a search term, you tell the system that the question mark can be replaced with one other character, but again any kind of character. For example: if you enter a search term like this: "craftsm?n", then you will not only get results for "craftsman" (singular), but also for "craftsmen" (plural).

Wild card example 2


Boolean operators

The portal also supports searching using Boolean operators and this can be very helpful when trying to get search results from data in more than one language in one query, in particular to solve the problem we started this FAQ page with. If you enter a query like this in the search box: "swan OR cygne OR Schwan OR Zwaan", then you will get a search results list containing search results for "swan" in English, French, German and Dutch.

Boolean operator example 1


You can take a look at search results in one language by filtering the search results on country, like for example: The Netherlands:

Boolean operator example 1, filtered on search results from The Netherlands


or France:

Boolean operator example 1, filtered on search results from France


It's also possible to combine wildcards and Boolean operators. For example: a search query like this: "slaver* OR esclavage" will offer you search results for "slavery" (English), "slavernij" (Dutch) and "esclavage" (French).


Topics - Tag Cloud

The Archives Portal Europe offers a "tag-cloud" on its homepage, giving access to data connected to "topics" or "themes". The topics in this tag-cloud form a selection of all topics available and this selection is randomly refreshed during each page refresh. The complete overview of all topics can be found on the "Topics"-page.


The concept behind this is that data (ie EAD/XML finding aids) can contain the names of these topics (in the EAD/XML element <subject/> within the element <controlaccess/>) and these can either automatically or manually be connected to the list of topics available in the back-end of the Archives Portal Europe. This list is matched with a controlled vocabulary, UKAT (UK Archival Thesaurus), so is originally in English, but the Archives Portal Europe's back-end offers the Country Managers a possibility to translate these terms into their own language, to enable their Institution Mangers to use these translated terms in their data instead of the English ones. So searching by clicking on the topic "slavery" of the tag-cloud could theoretically lead to search results on this search term in all other languages.

Theoretically, because for this to work properly the data have to be prepared for this. Ideally this will have to be done by the archival institutions contributing to the Archives Portal Europe themselves, by "tagging" their data in their collection management systems with the topic terms that are offered by the Archives Portal Europe. In case the EAD/XML exports of these systems contain these topic terms, they will automatically be recognised by the Archives Portal Europe's framework. But even in case archival institutions can't offer these topic terms via their collection management systems (yet), it's still possible for them to make use of this facility. The back-end of the Archives Portal Europe offers them functionality to connect finding aids to the topic terms. But this is only possible for full finding aids, not for individual descriptive units. So in order to make full advantage of the "topics/tag-cloud"-functionality, archival institutions best prepare their data for this in their collection management systems. This will also be a first step towards Linked Open Data, so in the end the efforts to make this work will pay off.