Indexing
Adding, updating and deleting documents
Topics: indexing whole documents; performing atomic partial updates of documents; deleting documents; "hard" commits.
Adding documents
The indexing of documents is done through the endpoint /ingest of
the search server installation.
This endpoint accepts POST requests, containing the document(s) to be
indexed in the request’s body. Documents must be encoded in one of these
format / mime-types:
-
text/xmlorapplication/xml -
text/jsonorapplication/json -
text/csvorapplication/csv
The capabilities of the default Solr update handler are present here also.
Ingested documents are automatically committed to the index after a preset period of time, or after a preset maximum number of documents have been ingested. Note that committing documents to the index does not make them automatically visible to the searches. For that a manual commit or a scheduled or manual optimize call need to be performed.
Performing atomic updates
To perform partial atomic updates to a document’s fields, it is possible to use field modifiers. The supported modifiers are:
| Modifier | Description |
|---|---|
|
Set or replace the field value(s) with the specified value(s), or remove the values if 'null' or empty list is specified as the new value. May be specified as a single value, or as a list for multivalued fields. |
|
Removes all occurrences of the specified regex from a multi-valued field. May be specified as a single value, or as a list. |
|
Removes (all occurrences of) the specified values from a multi-valued field. May be specified as a single value, or as a list. |
|
Increments a numeric value by a specific amount. Must be specified as a single numeric value. |
|
Adds the specified values to a multivalued field. May be specified as a single value, or as a list. |
Field modifiers are specified as single-attribute "objects", with the modifier name as an attribute, and the parameters as the attribute’s value; this "object" then takes the place of what would normally have been the field’s value.
The following example demonstrates the use of the field modifiers for the partial update of an object:
Example of the use of field modifiers
{
"id":"mydoc",
"price":{"set":99},
"popularity":{"inc":20},
"categories":{"add":["toys","games"]},
"promo_ids":{"remove":"a123x"},
"tags":{"remove":["free_to_try","on_sale"]}
}
Deleting documents
Deletion of documents is done through a POST request to the endpoint /delete, with the parameter q containing the query that identifies the document or documents to be deleted.
To delete individual documents, q can have the form id:<document id>.
To delete the entire set of documents, q can have the form *:*.
Selective deletion of groups of documents is typically done through a corresponding grouping field (e.g., sem_grouping:<some top-level category>)
Hard commits
Manual invocation of commits (or “hard” commits) can be done through:
-
A
POSTrequest to the endpoint/commit -
A
POSTrequest to the ingestion address with content typeapplication/xml, and body:<commit />
The first of the above options may take longer to complete, because the call only returns after all changes have been flushed to disk, and are “visible” in the search results of any subsequent search calls.