Koha How-To

Elastic Search Configuration

by Andrew Fuerste-Henry on Apr 3, 2020

Before jumping in, I have a few disclaimers:

Elastic is a big tool that can do a lot of things. We’re still in the process of making all of those things work in Koha.
Implementation of Elastic in Koha is a very actively ongoing process, so expect new features, fixes, and tweaks regularly.
As I write this, we have partners testing Elastic in Koha 19.05 and are looking forward to moving some of our Elastic testers to 19.11. As such, this post focuses on Elastic in those two versions of Koha.

Please also see our post about searching in Koha using Elastic.

The SearchEngine system preference

Which search engine Koha uses is determined by the system preference SearchEngine. There are only two choices here, Zebra and Elastic. Koha defaults to Zebra. Switching to Elastic is not quite as easy as just flipping this switch, though. There’s some setup that needs to happen on your server to get Elastic installed and running. If you flip this switch before that’s been done, you won’t get any search results at all.

Once Elastic has been set up on your server and we’ve flipped this switch for you, Zebra is going to continue to run and keep itself updated. For now, Koha’s still using Zebra in a few places outside of catalog searching, so it’s still around. That means that one could use the system preference to go back to Zebra at any time -- though we’d generally prefer that you didn’t, as we’d like to see and fix any Elastic issues you’re having. If you do switch back to Zebra, Elastic will not continue to keep itself updated while you use Zebra. That means switching back to Elastic requires a little extra work on the server end that we can take care of for you.

The ElasticsearchCrossFields system preference

This system preference was created in response to a change between Elastic versions 5 and 6. Currently, Koha is capable of using either version of Elastic. You can see your Elastic version in the About Koha page of your staff client. All ByWater partner libraries currently using Elastic are using Elastic 6.

If your Koha system is using Elastic 5 or earlier, you need the ElasticsearchCrossFields system preference to be set to "Disable." Otherwise, all searches will fail.

If your Koha system using Elastic 6 or later, you need the ElasticsearchCrossFields system preference to be set to "Enable." Otherwise keyword searches will return inaccurate results.

More specifically, on Elastic 6 and later, if ElasticsearchCrossFields is turned off, Koha will look for all of your search terms in all of your search indices, but only return titles where all of your terms were in the same index. So a search for "bram stoker dracula" would not find the novel (as "bram stoker" is in the author and "dracula" is in the title), but might find you the 1992 film titled Bram Stoker's Dracula.

The Search Engine Configuration page

Once you’ve switched to Elastic, you’ll see a new link in the Catalog section of Administration: Search Engine Configuration. This is where Koha shows you which parts of your MARC records get indexed for searching and then how those search indexes are weighted for use in relevancy ranking of search results. The former replaces the search index documentation from the Koha manual. The latter is something we’ve never had clear access to in Zebra. As of Koha version 19.11, most things on this page require some action on your server when they are changed, so we encourage you to contact us before making any changes here. That said, it’s still a great resource for checking which parts of your records are being indexed and how you might search them directly.

Mappings

The tabs labeled “Bibliographic records” and “Authorities” contain your mappings. On each tab the “Search field” column contains search index names and the “Mapping” column contains MARC fields and subfields. In the screenshot above, we can see that the author index contains the 100$a, 110$a, 111$a, 245$c, and 700$a. So an author search will return records with our search terms in any of those fields. Many MARC fields are included in several different indexes. If you’re uncertain of how to search for values in a particular MARC field, you can come to this page and use ctrl-F to look for the MARC field in question.

Not every MARC field is indexed by default. If you want to index a new MARC field or change which indexes include which MARC fields, we can change these values for you. For example, one of our public library testers separated the 245$n and 245$p into a new index called “title-part” so they could search season and volume numbers separately from the rest of the title. A special library partner requested a new index called “product-type” for the 513$a, where they’d been keeping internal data for local records.

The remaining three columns in these tabs have limited functionality. They describe whether or not these fields can be used to sort results, generate facets, or suggest other searches, respectively. As of Koha 19.11, changing these values will not enable additional functionality and can break existing functionality. As such, changing them is not recommended.

Currently, ByWater is maintaining two default mapping sets. You can see our academic library mapping here and our public library mapping here.

Weighting

The tab labeled “Search fields” shows the weightings applied to generate relevancy rankings in search results. With no weighting values assigned, all fields are considered equally important in your search results. Essentially, a blank here is the same as entering a one. By increasing the weight value, we give that field more importance so that records with our search terms in the weighted fields are pushed higher in our results than records with our search terms in unweighted fields. In our testing, we settled on the following default weights:

title: 32
author: 16
subject: 8
title-series: 4
contents: 2

These weightings are based on the assumption that your patrons are most likely to provide a title or author when searching. If not a title or author, they may give a subject or series title or some words contained in the 5XX fields. Remember you can always check the mappings to see exactly which fields and included in each of these weighted indices. Weights are applied at search time so can be adjusted without requiring a reindex. Feel free to play with these settings. but please let us know if you settle on a different weighting so we can make sure to record this.

Facets

Jumping back to the “Bibliographic records” tab, you’ll find a small table at the very bottom of the page that controls the display of your facets. We can uncheck the box for a given facet to disable it completely or drag the facets up and down the list to change the order. Tell us if you want to change things here so we can make sure those changes are retained should we need to rebuild your Elastic indices.

Elastic changes facets in one more way; it builds your facets from all of the search results it has. That means it does not use the MaxRecordsForFacets system preference at all.

Read more by Andrew Fuerste-Henry

Tags searching, elastic search, elastic