Schema

This chapter introduces you to the predefined schemata that come bundled with the Semanteer Search Layer, the available dynamic and static fields and their capabilities and intended use.

Predefined field types

Topics: basic-, text-, categorization- and special- field types in the default Semanteer index schema.

More information about the field types supported by Solr "out of the box" can be found in the section Solr Field Types of the Solr Reference Guide.

Basic field types

Field type Description

binary

Binary data.

boolean

Contains either true or false. Values of 1, t, or T in the first character are interpreted as true. Any other values in the first character are interpreted as false.

string

String (UTF-8 encoded string or Unicode).

date or tdate

Represents a point in time with millisecond precision. The format used is a restricted form of the canonical representation of dateTime in the XML Schema specification: YYYY-MM-DDThh:mm:ssZ

double or tdouble

Double field (64-bit IEEE floating point). double (precisionStep="0") enables efficient numeric sorting and minimizes index size; tdouble (precisionStep="8") enables efficient range queries.

float or tfloat

Floating point field (32-bit IEEE floating point). float (precisionStep="0") enables efficient numeric sorting and minimizes index size; tfloat (precisionStep="8") enables efficient range queries.

int or tint

Integer field (32-bit signed integer). int (precisionStep="0") enables efficient numeric sorting and minimizes index size; tint (precisionStep="8") enables efficient range queries.

long or tlong

Long field (64-bit signed integer). long (precisionStep="0") enables efficient numeric sorting and minimizes index size; tlong (precisionStep="8") enables efficient range queries.

Specialized entity field types

Field type Description

bbox

Used for "bounding boxes"

email

Field type that "understands" email addresses and maintains them as full tokens.

location

A latitude, longitude coordinate pair.

mime

A filed type that correctly tokenizes MIME-encoded content types

occurrence

A pair of start- and end- time numbers, used to mark the start and end time of event occurrences. The temporal space they refer to needs to be configured appropriately.

point

An arbitrary n-dimensional point.

tel_number

A field type that normalizes telephone numbers.

url

Field type that "understands" URL addresses and retains them as full tokens.

uuid

Universally Unique Identifier (UUID). Passing in a value of "NEW" causes the creation of a new UUID.

descendent_path

Field type used for FacetHierarchyComponent (see Hierarchical facets) specifying a character ":" used as divider.

ancestor_path

Field type used for FacetHierarchyComponent (see Hierarchical facets) specifying a character ":" used as divider.

cat_minimal

Specialized analysis for categories

cat_root

Specialized analysis for root categories

cat_leaf

Specialized analysis for leaf categories

location_rpt

A location field type able to store lat,lng locations and geometry (WKT strings). See also Spatial Search

location_rpt_approximate

Similar to location_rpt but less precise (and thus faster)

group_acls

Simple version of ACL checking, expects values separated by comma or semicolon

ean

International Article Numbers

Text field types

Field type Description

text_ws

Uses a white space tokenizer to split text into tokens.

text_general

Language-specific analysis, including stop-word and synonym treatment, as well as stemming.

text_precise

Minimal analysis for "precise" matching

text_minimal

An field type with tokenization aggressiveness "in the middle" between text_general and text_precise.

text_alpha

Used for alphabetic sorting.

text_spell

Used for the "sink" fields from which spelling suggestions are derived.

text_suggest

Used for the "sink" fields from which auto-complete suggestions are derived.

text_suggest_prefix

Alternative type definition (for suggestions) that uses shingles

text_suggest_phrase

Similar to text_suggest_prefix

lowercase

Lowercases the entire field value, keeping it as a single token.

text_geo

Text field used for geographic fields (names of places): Different synonyms (at query time) and no stemming.

Predefined fields

An overview of the available static and dynamic fields in the default schema.

Main record fields

Base fields that are typically used independently of the type of indexed document.

R = required, I = indexed, S = stored, M = multi-valued

Field Type R I S M Description

id

boolean

Must be globally unique within a core.

sem_lang

string

Two-letter language code.

sem_source_id

string

The id of the indexing source.

sem_source_type

string

The type of indexing source.

sem_record_type

string

The main type of the document.

sem_record_subtype

string

Optional sub-type of the document.

sem_record_info

string

Opaque field where presentation frontend-related data can be stored.

sem_asset_info

string

sem_parent_record

string

Reference to the identifier of the parent document, where a parent-child relation exists.

sem_grouping

string

The main grouping field for documents; contents may vary per installation.

sem_active

boolean

sem_title

text_general

The document title.

sem_title_alpha

text_alpha

Same as sem_title but tokenized for alphabetic sorting.

sem_subtitle

text_general

The document sub-title.

sem_subtitle_alpha

text_alpha

Same as sem_subtitle but tokenized for alphabetic sorting.

sem_abstract

text_general

A summary of the document.

sem_content

text_general

The actual textual content of the document.

sem_url

url

The document’s URL.

sem_keywords

text_general

Comma-separated list of document keywords.

sem_tags

string

Comma-separated list of document tags.

sem_author

text_minimal

The document’s author(s).

sem_author_facet

string

Automatically generated field to support faceting on authors.

sem_author_info

string

An opaque field to store frontend-related information about the document author(s)

sem_creation_date

date

The document’s creation date.

sem_modification_date

date

The date at which the document was last modified.

sem_publication_date

date

The date at which the document was published (to be used if there is a publication workflow in place)

sem_popularity

float

Optional field that can be used for boosting or selecting the most popular documents in an index.

sem_sort_order

int

Field used for sorting

*_mfacet

string

Dynamic field that may be used for faceting

Predefined fields related to the "address style" location associated with an indexed document.

R = required, I = indexed, S = stored, M = multi-value

Field Type R I S M Description

sem_location

location_rpt_approximate

Geographic location - can be any geo shape, from a single point to a multi-polygon. Level of geographic precision can be adjusted in the schema to fit different use cases and performance goals.

sem_location_latlng

location

Geographic point expressed as latitude and longitude coordinates. Faster than the fields that support complex geo objects, ideal for distance sorting.

sem_geo_object

location_rpt_approximate

Geographic location - additional field with the same capabilities as sem_location.

sem_address_info

string

Opaque field for frontend-related address information.

sem_address_street

text_minimal

Street name and number

sem_address_city

text_minimal

Town or city.

sem_address_city_facet

string

Automatically generated field to support faceting on city names.

sem_address_postal_code

text_minimal

Postal code.

sem_address_postal_code_facet

string

Automatically generated field to support faceting on postal codes.

sem_address_district

text_minimal

Organizational or geographical district.

sem_address_district _facet

string

Automatically generated field to support faceting on district names.

sem_address_country

text_minimal

Country (can be full name or country code).

sem_address_country_facet

string

Automatically generated field to support faceting on country names (or codes).

Fields related to contact details (person, telephone, email, etc.)

R = required, I = indexed, S = stored, M = multi-valued

Field Type R I S M Description

sem_contact_info

string

Opaque field for frontend-related contact information.

sem_contact_phone

tel_number

Main (landline) telephone number.

sem_contact_mobile

tel_number

Mobile telephone number.

sem_contact_fax

tel_number

Fax number.

sem_contact_email

email

Email address.

sem_contact_social

string

Opaque field that can be used to store contact information for social networking accounts.

sem_contact_name

text_minimal

The name of the contact (may be a person’s name but not necessary).

Fields that captures details for indexed document resources.

R = required, I = indexed, S = stored, M = multi-valued

Field Type R I S M Description

sem_image_url

url

Document image

sem_image_thumbnail_url

url

Document thumbnail image

sem_links

url

URLs related to the document (e.g., of links inside the document)

sem_headings

text_general

Document section headings

sem_meta_description

text_general

Document meta-data description

sem_meta_title

text_general

Document meta-data title

sem_host

string

Host part of the document’s URL (useful when documents may be indexed from multiple hosts).

sem_path

string

Path part of the document’s URL.

sem_mime_type

mime

Document MIME type.

sem_mime_type_facet

string

Automatically generated field that allows faceting by MIME types.

sem_content_size

long

Document content size (in bytes).

Fields for capturing details of one-time and recurring events.

R = required, I = indexed, S = stored, M = multi-valued

Field Type R I S M Description

sem_event_location_name

text_minimal

Name of an event location.

sem_event_location_facet

string

Automatically generated field to support faceting on event location names.

sem_event_start

date

Start date / time of an event.

sem_event_end

date

End date / time of an event.

sem_event_occurences

occurrence

For recurring events, pairs of start and end dates and times.

sem_event_organizer

text_minimal

Event organizer’s name (person, company, etc.)

sem_event_organizer_facet

string

Automatically generated field to support faceting on event organizers.

Fields related to typical web portals features, such as blogs, FAQs and online newsletters.

R = required, I = indexed, S = stored, M = multi-valued

Field Type R I S M Description

sem_blog_name

text_general

Name of the blog

sem_blog_facet

string

Automatically generated field to support faceting on blog names.

sem_newsletter_issue

text_general

Newsletter issue (as simple text)

sem_newsletter_facet

string

Automatically generated field to support faceting on newsletter issues.

sem_faq_question

text_general

Question part of a FAQ

sem_faq_answer

text_general

Answer part of an FAQ

sem_faq_facet

string

Auxiliary field that can be populated for faceting on whole FAQs.

sem_comments

text_general

Can hold the text of comments of a particular page, article, blog post, etc.

sem_comments_count

int

Number of comments related to the current record (see sem_comments)

sem_comments_info

string

Opaque field for storing additional information for the comments related to the current record (see sem_comments)

Fields intended for capturing details of products in online shops. See also Categorization fields for fields that can be used to store product category information.

R = required, I = indexed, S = stored, M = multi-valued

Field Type R I S M Description

sem_product_sku

text_minimal

A customer-specific identifier for a product.

sem_product_sku_facet

string

Automatically generated field that stores the SKU for use in exact matches and faceting.

sem_product_description

text_general

Product description.

sem_product_price

float

Product price.

sem_product_price_currency

string

Currency in which product price values are expressed.

sem_product_specialoffer_price

float

Product price when on special offer.

sem_product_specialoffer_start

date

Start date / time of a special offer.

sem_product_specialoffer_end

date

End date / time of a special offer.

sem_product_specialoffer_info

string

Opaque string to store additional information for a special offer (e.g., to be used in the frontend).

sem_product_weight

float

Product weight.

sem_product_weight_unit

string

Unit in which the product’s weight is expressed,

sem_product_size

text_minimal

Textual description of the the product’s dimensions.

sem_product_size_facet

string

Automatically generated field for faceting over product size (usable only for a single dimension, for multiple dimensions additional fields need to be declared).

sem_product_in_stock

boolean

Whether a product is in stock.

sem_product_color

text_general

The product’s color(s).

sem_product_color_facet

string

Automatically generated field for faceting over the product’s color(s).

sem_product_brand

text_general

The product’s brand’s name.

sem_product_brand_facet

string

Automatically generated field for faceting over the product’s brand’s name.

sem_product_manufacturer

text_general

The product’s manufacturer.

sem_product_manufacturer_facet

string

Automatically generated field for faceting over the product’s manufacturer.

sem_opening_hours

occurrence

Specialized field to store the opening hours of shops, so that "is it open now" or "is it open at any time during the weekend" can be answered.

sem_product_ean

ean

A product’s EAN.

sem_product_age_group

text_minimal

The age group(s) for which a product is intended.

sem_product_age_group_facet

string

Automatically generated field for faceting over the age group(s) for which a product is intended.

sem_product_gender

text_minimal

The gender(s) for which a product is intended.

sem_product_gender_facet

string

Automatically generated field for faceting over the gender(s) for which a product is intended.

sem_product_material

text_minimal

The product’s material(s).

sem_product_material_facet

string

Automatically generated field for faceting over the product’s material(s).

Categorization fields

Fields that facilitate the capturing and searching of categorical information, including the automatic management of hierarchical information.

R = required, I = indexed, S = stored, M = multi-valued

Field Type R I S M Description

sem_topic

string

Main document topic

sem_topic_*

string

Field prefix for dynamic fields that represent additional document topics.

sem_category_main

string

Main document category (non-hierarchical)

sem_category_main_*

string

Field prefix for dynamic fields that represent additional document categories (non-hierarchical).

HIERACHICAL FIELDS

Field Type R I S M Description

sem_category_ids_branch

descendent_path

Hierarchical category branch(es), using category ids

sem_category_ids

cat_minimal

Automatically generated field with all category ids in the branch.

sem_category_ids_root

cat_root

Automatically generated field with only the root category ids

sem_category_ids_leaf

cat_leaf

Automatically generated field with only the bottom (leaf) category ids

sem_category_names_branch

descendent_path

Hierarchical category branch(es), using category names

sem_category_names

cat_minimal

Automatically generated field with all category names in the branch.

sem_category_names_root

cat_root

Automatically generated field with only the root category names

sem_category_names_leaf

cat_leaf

Automatically generated field with only the bottom (leaf) category names

sem_category_ids_branch_*

descendent_path

Prefix version of sem_category_ids_branch

sem_category_ids_*

cat_minimal

Prefix version of sem_category_ids

sem_category_ids_root_*

cat_root

Prefix version of sem_category_ids_root

sem_category_ids_leaf_*

cat_leaf

Prefix version of sem_category_ids_leaf

sem_category_names_branch_*

descendent_path

Prefix version of sem_category_names_branch

sem_category_names_*

cat_minimal

Prefix version of sem_category_names

sem_category_names_root_*

cat_root

Prefix version of sem_category_names_root

sem_category_names_leaf_*

cat_leaf

Prefix version of sem_category_names_leaf

Synthetic fields (automatically generated)

The fields described here are generated automatically and can be used for various search purposes.

R = required, I = indexed, S = stored, M = multi-valued

Field Type R I S M Description

sem_timestamp

date

Field set automatically to the time of indexing; can be overridden if desirable.

sem_text_search

text_general

"Sink" field into which all searchable fields are copied

sem_text_spell

text_spell

"Sink" field into which all fields from which spelling suggestions can be derived are copied.

sem_text_suggest

sem_text_suggest

"Sink" field into which all fields from which auto-complete suggestions can be derived are copied.

sem_text_excerpt

sem_text_excerpt

This field can be optionally used as a "sink" field from which the search excerpts (or "snippets") are extracted. Normally the field sem_text_search is used for that.