search_core/Documentation/source/configuration/indexing.rst
Daniel Siepmann 218d8d7289
FEATURE: Make content fields configurable
Allows integrators to configure which fields should be used to produce
field "content" for indexed pages.

Before only "bodytext" was used. This is now configurable and "header"
was added to defaults.

Resolves: #134
2018-03-15 09:15:26 +01:00

4.4 KiB

Indexing

Holds settings regarding the indexing, e.g. of TYPO3 records, to search services.

Configured as:

plugin {
    tx_searchcore {
        settings {
            indexing {
                identifier {
                    indexer = FullyQualifiedClassname
                    // the settings
                }
            }
        }
    }
}

Where identifier is up to you, but should match table names to make TcaIndexer work.

The following settings are available. For each setting its documented which indexer consumes it.

rootLineBlacklist

Used by: TcaIndexer, PagesIndexer.

Defines a blacklist of page uids. Records below any of these pages, or subpages, are not indexed. This allows you to define areas that should not be indexed. The page attribute No Search is also taken into account to prevent indexing records from only one page without recursion.

Contains a comma separated list of page uids. Spaces are trimmed.

Example:

plugin.tx_searchcore.settings.indexing.pages.rootLineBlacklist = 3, 10, 100

additionalWhereClause

Used by: TcaIndexer, PagesIndexer.

Add additional SQL to where clauses to determine indexable records from the table. This way you can exclude specific records like tt_content records with specific CType values or something else.

Example:

plugin.tx_searchcore.settings.indexing.tt_content.additionalWhereClause = tt_content.CType NOT IN ('gridelements_pi1', 'list', 'div', 'menu')

Attention

Make sure to prefix all fields with the corresponding table name. The selection from database might contain joins and can lead to SQL errors if a field exists in multiple tables.

abstractFields

Used by: PagesIndexer.

Note

Will be migrated to dataprocessors in the future.

Define which field should be used to provide the auto generated field "search_abstract". The fields have to exist in the record to be indexed. Therefore fields like content are also possible.

Example:

# As last fallback we use the content of the page
plugin.tx_searchcore.settings.indexing.pages.abstractFields := addToList(content)

Default:

abstract, description, bodytext

contentFields

Used by: PagesIndexer.

Define which fields should be used to provide the auto generated field "content".

Example:

plugin.tx_searchcore.settings.indexing.pages.contentFields := addToList(table_caption)

Default:

header, bodytext

mapping

Used by: connection_elasticsearch connection while indexing.

Define mapping for Elasticsearch, have a look at the official docs: https://www.elastic.co/guide/en/elasticsearch/reference/5.2/mapping.html You are able to define the mapping for each property / column.

Example:

plugin.tx_searchcore.settings.indexing.tt_content.mapping {
    CType {
        type = keyword
    }
}

The above example will define the CType field of tt_content as type: keyword. This makes building a facet possible.

index

Used by: connection_elasticsearch connection while indexing.

Define index for Elasticsearch, have a look at the official docs: https://www.elastic.co/guide/en/elasticsearch/reference/5.2/indices-create-index.html

Example:

plugin.tx_searchcore.settings.indexing.tt_content.index {
    analysis {
        analyzer {
            ngram4 {
                type = custom
                tokenizer = ngram4
                char_filter = html_strip
                filter = lowercase, asciifolding
            }
        }

        tokenizer {
            ngram4 {
                type = ngram
                min_gram = 4
                max_gram = 4
            }
        }
    }
}

char_filter and filter are a comma separated list of options.

dataProcessing

Used by: All connections while indexing, due to implementation inside AbstractIndexer.

Configure modifications on each document before sending it to the configured connection. For full documentation check out dataprocessors.