Schema
This chapter introduces you to the predefined schemata that come bundled with the Semanteer Search Layer, the available dynamic and static fields and their capabilities and intended use.
Predefined field types
Topics: basic-, text-, categorization- and special- field types in the default Semanteer index schema.
More information about the field types supported by Solr "out of the box" can be found in the section Solr Field Types of the Solr Reference Guide.
Basic field types
| Field type | Description |
|---|---|
|
Binary data. |
|
Contains either true or false. Values of 1, t, or T in the first character are interpreted as true. Any other values in the first character are interpreted as false. |
|
String (UTF-8 encoded string or Unicode). |
|
Represents a point in time with millisecond precision. The format used is a restricted form of the canonical representation of dateTime in the XML Schema specification: YYYY-MM-DDThh:mm:ssZ |
|
Double field (64-bit IEEE floating point). double (precisionStep="0") enables efficient numeric sorting and minimizes index size; tdouble (precisionStep="8") enables efficient range queries. |
|
Floating point field (32-bit IEEE floating point). float (precisionStep="0") enables efficient numeric sorting and minimizes index size; tfloat (precisionStep="8") enables efficient range queries. |
|
Integer field (32-bit signed integer). int (precisionStep="0") enables efficient numeric sorting and minimizes index size; tint (precisionStep="8") enables efficient range queries. |
|
Long field (64-bit signed integer). long (precisionStep="0") enables efficient numeric sorting and minimizes index size; tlong (precisionStep="8") enables efficient range queries. |
Specialized entity field types
| Field type | Description |
|---|---|
|
Used for "bounding boxes" |
|
Field type that "understands" email addresses and maintains them as full tokens. |
|
A latitude, longitude coordinate pair. |
|
A filed type that correctly tokenizes MIME-encoded content types |
|
A pair of start- and end- time numbers, used to mark the start and end time of event occurrences. The temporal space they refer to needs to be configured appropriately. |
|
An arbitrary n-dimensional point. |
|
A field type that normalizes telephone numbers. |
|
Field type that "understands" URL addresses and retains them as full tokens. |
|
Universally Unique Identifier (UUID). Passing in a value of "NEW" causes the creation of a new UUID. |
|
Field type used for FacetHierarchyComponent (see Hierarchical facets) specifying a character ":" used as divider. |
|
Field type used for FacetHierarchyComponent (see Hierarchical facets) specifying a character ":" used as divider. |
|
Specialized analysis for categories |
|
Specialized analysis for root categories |
|
Specialized analysis for leaf categories |
|
A location field type able to store lat,lng locations and geometry (WKT strings). See also Spatial Search |
|
Similar to location_rpt but less precise (and thus faster) |
|
Simple version of ACL checking, expects values separated by comma or semicolon |
|
International Article Numbers |
Text field types
| Field type | Description |
|---|---|
|
Uses a white space tokenizer to split text into tokens. |
|
Language-specific analysis, including stop-word and synonym treatment, as well as stemming. |
|
Minimal analysis for "precise" matching |
|
An field type with tokenization aggressiveness "in the middle" between text_general and text_precise. |
|
Used for alphabetic sorting. |
|
Used for the "sink" fields from which spelling suggestions are derived. |
|
Used for the "sink" fields from which auto-complete suggestions are derived. |
|
Alternative type definition (for suggestions) that uses shingles |
|
Similar to text_suggest_prefix |
|
Lowercases the entire field value, keeping it as a single token. |
|
Text field used for geographic fields (names of places): Different synonyms (at query time) and no stemming. |
Predefined fields
An overview of the available static and dynamic fields in the default schema.
Main record fields
Base fields that are typically used independently of the type of indexed document.
R = required, I = indexed, S = stored, M = multi-valued
| Field | Type | R | I | S | M | Description |
|---|---|---|---|---|---|---|
|
|
✓ |
✓ |
✓ |
Must be globally unique within a core. |
|
|
|
✓ |
✓ |
Two-letter language code. |
||
|
|
✓ |
✓ |
The id of the indexing source. |
||
|
|
✓ |
✓ |
The type of indexing source. |
||
|
|
✓ |
✓ |
✓ |
The main type of the document. |
|
|
|
✓ |
✓ |
Optional sub-type of the document. |
||
|
|
✓ |
Opaque field where presentation frontend-related data can be stored. |
|||
|
|
✓ |
✓ |
|||
|
|
✓ |
✓ |
Reference to the identifier of the parent document, where a parent-child relation exists. |
||
|
|
✓ |
✓ |
The main grouping field for documents; contents may vary per installation. |
||
|
|
✓ |
✓ |
|||
|
|
✓ |
✓ |
The document title. |
||
|
|
✓ |
✓ |
Same as sem_title but tokenized for alphabetic sorting. |
||
|
|
✓ |
✓ |
The document sub-title. |
||
|
|
✓ |
✓ |
Same as sem_subtitle but tokenized for alphabetic sorting. |
||
|
|
✓ |
✓ |
A summary of the document. |
||
|
|
✓ |
✓ |
The actual textual content of the document. |
||
|
|
✓ |
✓ |
The document’s URL. |
||
|
|
✓ |
✓ |
✓ |
Comma-separated list of document keywords. |
|
|
|
✓ |
✓ |
✓ |
Comma-separated list of document tags. |
|
|
|
✓ |
✓ |
✓ |
The document’s author(s). |
|
|
|
✓ |
✓ |
Automatically generated field to support faceting on authors. |
||
|
|
✓ |
An opaque field to store frontend-related information about the document author(s) |
|||
|
|
✓ |
✓ |
The document’s creation date. |
||
|
|
✓ |
✓ |
The date at which the document was last modified. |
||
|
|
✓ |
✓ |
The date at which the document was published (to be used if there is a publication workflow in place) |
||
|
|
✓ |
✓ |
Optional field that can be used for boosting or selecting the most popular documents in an index. |
||
|
|
✓ |
✓ |
Field used for sorting |
||
|
|
✓ |
Dynamic field that may be used for faceting |
Location-related fields
Predefined fields related to the "address style" location associated with an indexed document.
R = required, I = indexed, S = stored, M = multi-value
| Field | Type | R | I | S | M | Description |
|---|---|---|---|---|---|---|
|
|
✓ |
✓ |
Geographic location - can be any geo shape, from a single point to a multi-polygon. Level of geographic precision can be adjusted in the schema to fit different use cases and performance goals. |
||
|
|
✓ |
✓ |
Geographic point expressed as latitude and longitude coordinates. Faster than the fields that support complex geo objects, ideal for distance sorting. |
||
|
|
✓ |
✓ |
Geographic location - additional field with the same capabilities as sem_location. |
||
|
|
✓ |
Opaque field for frontend-related address information. |
|||
|
|
✓ |
✓ |
Street name and number |
||
|
|
✓ |
✓ |
Town or city. |
||
|
|
✓ |
Automatically generated field to support faceting on city names. |
|||
|
|
✓ |
✓ |
Postal code. |
||
|
|
✓ |
Automatically generated field to support faceting on postal codes. |
|||
|
|
✓ |
✓ |
Organizational or geographical district. |
||
|
|
✓ |
Automatically generated field to support faceting on district names. |
|||
|
|
✓ |
✓ |
Country (can be full name or country code). |
||
|
|
✓ |
Automatically generated field to support faceting on country names (or codes). |
Contact-related fields
Fields related to contact details (person, telephone, email, etc.)
R = required, I = indexed, S = stored, M = multi-valued
| Field | Type | R | I | S | M | Description |
|---|---|---|---|---|---|---|
|
|
✓ |
Opaque field for frontend-related contact information. |
|||
|
|
✓ |
✓ |
✓ |
Main (landline) telephone number. |
|
|
|
✓ |
✓ |
✓ |
Mobile telephone number. |
|
|
|
✓ |
✓ |
✓ |
Fax number. |
|
|
|
✓ |
✓ |
✓ |
Email address. |
|
|
|
✓ |
✓ |
✓ |
Opaque field that can be used to store contact information for social networking accounts. |
|
|
|
✓ |
✓ |
The name of the contact (may be a person’s name but not necessary). |
Document-related fields
Fields that captures details for indexed document resources.
R = required, I = indexed, S = stored, M = multi-valued
| Field | Type | R | I | S | M | Description |
|---|---|---|---|---|---|---|
|
|
✓ |
✓ |
Document image |
||
|
|
✓ |
Document thumbnail image |
|||
|
|
✓ |
✓ |
✓ |
URLs related to the document (e.g., of links inside the document) |
|
|
|
✓ |
✓ |
✓ |
Document section headings |
|
|
|
✓ |
✓ |
Document meta-data description |
||
|
|
✓ |
✓ |
Document meta-data title |
||
|
|
✓ |
✓ |
Host part of the document’s URL (useful when documents may be indexed from multiple hosts). |
||
|
|
✓ |
✓ |
Path part of the document’s URL. |
||
|
|
✓ |
✓ |
Document MIME type. |
||
|
|
✓ |
Automatically generated field that allows faceting by MIME types. |
|||
|
|
✓ |
✓ |
Document content size (in bytes). |
Event-related fields
Fields for capturing details of one-time and recurring events.
R = required, I = indexed, S = stored, M = multi-valued
| Field | Type | R | I | S | M | Description |
|---|---|---|---|---|---|---|
|
|
✓ |
✓ |
Name of an event location. |
||
|
|
✓ |
Automatically generated field to support faceting on event location names. |
|||
|
|
✓ |
✓ |
Start date / time of an event. |
||
|
|
✓ |
✓ |
End date / time of an event. |
||
|
|
✓ |
✓ |
✓ |
For recurring events, pairs of start and end dates and times. |
|
|
|
✓ |
✓ |
Event organizer’s name (person, company, etc.) |
||
|
|
✓ |
Automatically generated field to support faceting on event organizers. |
Web-related fields
Fields related to typical web portals features, such as blogs, FAQs and online newsletters.
R = required, I = indexed, S = stored, M = multi-valued
| Field | Type | R | I | S | M | Description |
|---|---|---|---|---|---|---|
|
|
✓ |
✓ |
Name of the blog |
||
|
|
✓ |
Automatically generated field to support faceting on blog names. |
|||
|
|
✓ |
✓ |
Newsletter issue (as simple text) |
||
|
|
✓ |
Automatically generated field to support faceting on newsletter issues. |
|||
|
|
✓ |
✓ |
Question part of a FAQ |
||
|
|
✓ |
✓ |
Answer part of an FAQ |
||
|
|
✓ |
Auxiliary field that can be populated for faceting on whole FAQs. |
|||
|
|
✓ |
✓ |
✓ |
Can hold the text of comments of a particular page, article, blog post, etc. |
|
|
|
✓ |
✓ |
Number of comments related to the current record (see sem_comments) |
||
|
|
✓ |
Opaque field for storing additional information for the comments related to the current record (see sem_comments) |
Shop-related fields
Fields intended for capturing details of products in online shops. See also Categorization fields for fields that can be used to store product category information.
R = required, I = indexed, S = stored, M = multi-valued
| Field | Type | R | I | S | M | Description |
|---|---|---|---|---|---|---|
|
|
✓ |
✓ |
✓ |
A customer-specific identifier for a product. |
|
|
|
✓ |
✓ |
Automatically generated field that stores the SKU for use in exact matches and faceting. |
||
|
|
✓ |
✓ |
Product description. |
||
|
|
✓ |
✓ |
Product price. |
||
|
|
✓ |
✓ |
Currency in which product price values are expressed. |
||
|
|
✓ |
✓ |
Product price when on special offer. |
||
|
|
✓ |
✓ |
Start date / time of a special offer. |
||
|
|
✓ |
✓ |
End date / time of a special offer. |
||
|
|
✓ |
Opaque string to store additional information for a special offer (e.g., to be used in the frontend). |
|||
|
|
✓ |
✓ |
Product weight. |
||
|
|
✓ |
✓ |
Unit in which the product’s weight is expressed, |
||
|
|
✓ |
✓ |
Textual description of the the product’s dimensions. |
||
|
|
✓ |
Automatically generated field for faceting over product size (usable only for a single dimension, for multiple dimensions additional fields need to be declared). |
|||
|
|
✓ |
✓ |
Whether a product is in stock. |
||
|
|
✓ |
✓ |
✓ |
The product’s color(s). |
|
|
|
✓ |
✓ |
Automatically generated field for faceting over the product’s color(s). |
||
|
|
✓ |
✓ |
✓ |
The product’s brand’s name. |
|
|
|
✓ |
✓ |
Automatically generated field for faceting over the product’s brand’s name. |
||
|
|
✓ |
✓ |
✓ |
The product’s manufacturer. |
|
|
|
✓ |
✓ |
Automatically generated field for faceting over the product’s manufacturer. |
||
|
|
✓ |
✓ |
✓ |
Specialized field to store the opening hours of shops, so that "is it open now" or "is it open at any time during the weekend" can be answered. |
|
|
|
✓ |
✓ |
A product’s EAN. |
||
|
|
✓ |
✓ |
✓ |
The age group(s) for which a product is intended. |
|
|
|
✓ |
✓ |
Automatically generated field for faceting over the age group(s) for which a product is intended. |
||
|
|
✓ |
✓ |
✓ |
The gender(s) for which a product is intended. |
|
|
|
✓ |
✓ |
Automatically generated field for faceting over the gender(s) for which a product is intended. |
||
|
|
✓ |
✓ |
✓ |
The product’s material(s). |
|
|
|
✓ |
✓ |
Automatically generated field for faceting over the product’s material(s). |
Categorization fields
Fields that facilitate the capturing and searching of categorical information, including the automatic management of hierarchical information.
R = required, I = indexed, S = stored, M = multi-valued
| Field | Type | R | I | S | M | Description |
|---|---|---|---|---|---|---|
|
|
✓ |
✓ |
✓ |
Main document topic |
|
|
|
✓ |
✓ |
✓ |
Field prefix for dynamic fields that represent additional document topics. |
|
|
|
✓ |
✓ |
✓ |
Main document category (non-hierarchical) |
|
|
|
✓ |
✓ |
✓ |
Field prefix for dynamic fields that represent additional document categories (non-hierarchical). |
HIERACHICAL FIELDS
| Field | Type | R | I | S | M | Description |
|---|---|---|---|---|---|---|
|
|
✓ |
✓ |
✓ |
Hierarchical category branch(es), using category ids |
|
|
|
✓ |
✓ |
Automatically generated field with all category ids in the branch. |
||
|
|
✓ |
✓ |
Automatically generated field with only the root category ids |
||
|
|
✓ |
✓ |
Automatically generated field with only the bottom (leaf) category ids |
||
|
|
✓ |
✓ |
✓ |
Hierarchical category branch(es), using category names |
|
|
|
✓ |
✓ |
Automatically generated field with all category names in the branch. |
||
|
|
✓ |
✓ |
Automatically generated field with only the root category names |
||
|
|
✓ |
✓ |
Automatically generated field with only the bottom (leaf) category names |
||
|
|
✓ |
✓ |
✓ |
Prefix version of sem_category_ids_branch |
|
|
|
✓ |
✓ |
Prefix version of sem_category_ids |
||
|
|
✓ |
✓ |
Prefix version of sem_category_ids_root |
||
|
|
✓ |
✓ |
Prefix version of sem_category_ids_leaf |
||
|
|
✓ |
✓ |
✓ |
Prefix version of sem_category_names_branch |
|
|
|
✓ |
✓ |
Prefix version of sem_category_names |
||
|
|
✓ |
✓ |
Prefix version of sem_category_names_root |
||
|
|
✓ |
✓ |
Prefix version of sem_category_names_leaf |
Synthetic fields (automatically generated)
The fields described here are generated automatically and can be used for various search purposes.
R = required, I = indexed, S = stored, M = multi-valued
| Field | Type | R | I | S | M | Description |
|---|---|---|---|---|---|---|
|
|
✓ |
✓ |
Field set automatically to the time of indexing; can be overridden if desirable. |
||
|
|
✓ |
✓ |
✓ |
"Sink" field into which all searchable fields are copied |
|
|
|
✓ |
✓ |
"Sink" field into which all fields from which spelling suggestions can be derived are copied. |
||
|
|
✓ |
✓ |
"Sink" field into which all fields from which auto-complete suggestions can be derived are copied. |
||
|
|
✓ |
✓ |
✓ |
This field can be optionally used as a "sink" field from which the search excerpts (or "snippets") are extracted. Normally the field sem_text_search is used for that. |