Indexing

Adding, updating and deleting documents

Topics: indexing whole documents; performing atomic partial updates of documents; deleting documents; "hard" commits.

Adding documents

The indexing of documents is done through the endpoint /ingest of the search server installation.

This endpoint accepts POST requests, containing the document(s) to be indexed in the request’s body. Documents must be encoded in one of these format / mime-types:

  • text/xml or application/xml

  • text/json or application/json

  • text/csv or application/csv

The capabilities of the default Solr update handler are present here also.

Ingested documents are automatically committed to the index after a preset period of time, or after a preset maximum number of documents have been ingested. Note that committing documents to the index does not make them automatically visible to the searches. For that a manual commit or a scheduled or manual optimize call need to be performed.

Performing atomic updates

To perform partial atomic updates to a document’s fields, it is possible to use field modifiers. The supported modifiers are:

Modifier Description

set

Set or replace the field value(s) with the specified value(s), or remove the values if 'null' or empty list is specified as the new value. May be specified as a single value, or as a list for multivalued fields.

removeregex

Removes all occurrences of the specified regex from a multi-valued field. May be specified as a single value, or as a list.

remove

Removes (all occurrences of) the specified values from a multi-valued field. May be specified as a single value, or as a list.

inc

Increments a numeric value by a specific amount. Must be specified as a single numeric value.

add

Adds the specified values to a multivalued field. May be specified as a single value, or as a list.

Field modifiers are specified as single-attribute "objects", with the modifier name as an attribute, and the parameters as the attribute’s value; this "object" then takes the place of what would normally have been the field’s value.

The following example demonstrates the use of the field modifiers for the partial update of an object:

Example of the use of field modifiers

{
 "id":"mydoc",
 "price":{"set":99},
 "popularity":{"inc":20},
 "categories":{"add":["toys","games"]},
 "promo_ids":{"remove":"a123x"},
 "tags":{"remove":["free_to_try","on_sale"]}
}

Deleting documents

Deletion of documents is done through a POST request  to the endpoint /delete, with the parameter q containing the query that identifies the document or documents to be deleted.

To delete individual documents, q can have the form id:<document id>.

To delete the entire set of documents, q can have the form *:*.

Selective deletion of groups of documents is typically done through a corresponding grouping field (e.g., sem_grouping:<some top-level category>)

Hard commits

Manual invocation of commits (or “hard” commits) can be done through:

  • POST request to the endpoint /commit

  • POST request to the ingestion address with content type application/xml, and body: <commit />

The first of the above options may take longer to complete, because the call only returns after all changes have been flushed to disk, and are “visible” in the search results of any subsequent search calls.