Search Components

embed search functionality into a site

Composum Pages provides an easy to setup search mechanism with configurable components for search fields and search results, which transparently supports Versioning of Pages.

The Composum search components utilizes the JCR fulltext search to search for Composum pages. This includes functionality to ensure that search words spread out over the whole subtree of nodes of a page will still find the page, for generating excerpts containing the search words from the page, and configurable components to display a search field and the search results. If the site is configured to display a certain release (see Content Staging / Versioning of Pages), the search only searches this release. The search query searches for Composum Pages (more specifically, only the parent nodes of nodes with primary type cpp:PageContent are found.

Components

Search Field

composum/pages/components/search/result

There is a search field component embedded in the navigation bar. In the Pages editor it's properties can be edited by selecting it and the "edit" icon. Its properties are primarily concerned with its appearance, and the page at which the search result component is embedded that displays the results.

Property dialog of the search field component

Query Syntax

The search uses JCR 2.0 fulltext queries. The query can contain a number of search terms, phrases enclosed in double quotes, and negated search terms or phrases starting with a minus. Each search term can contain the wildcards * for an arbitrary number or characters and ? for a single character. For example, the search Composum Widget* -Wiki "Search Component" will only find pages that contain the word Composum, a word starting with Widget and the phrase Search Component, but do not contain the word Wiki.

Example in the Prototype

The pages prototype contains at /content/prototype/pages/components/site/home/meta/search an example search result page (see screenshot above), including configurations for the components. The configuration for the search box can be found at the site homepage.

Presentation of the search results

Highlighting of search terms in the results

In the result overview, an excerpt with textual fragments that contain the search words is displayed, and the search terms are highlighted. When the user calls up one of the results by clicking on a link, it is desirable that the search words are highlighted on that page also. If these are out of view due to scrolling or folding, it might be a good idea to scroll the page or unfold the corresponding regions. As a preparation for such a mechanism the positive search terms and phrases contained in the query are appended as parameters to the URL with name search.term. If there are several search terms, this parameter occurs once for each term. These could be used by e.g. a Javascript highlighting mechanism.

Search Result

composum/pages/components/search/result

The search result component can be embedded into a container and provides configurations which path serves as search root, the number of search results presented in one page, the headline and error texts to display, and instructions how to display each result. This can either be done by a special selector, at which all pages would have to display an excerpt of themselves, or by a template that is called for each result. In both cases, the search result is transmitted in the request attribute searchresult that contains informations about the result to display within the component. This attribute can e.g. be accessed with

<jsp:useBean id="searchresult" type="com.composum.pages.commons.service.SearchService.Result" scope="request" />

An example template is provided as composum/pages/components/search/defaulttemplate.

Property dialog for the search result component

There is a component provided for the display of the search field and one for the display of search results. The configuration can be provided at the resource including the component itself, or inherited. For the search field embedded in the navigation the configuration is put into the home page at jcr:content/search (possibly also inherited).

Example search page /content/prototype/pages/components/site/home/meta/search in the prototype, showing the search field in the navigation, a search field embedded into the page and the search results component, using the provided template composum/pages/components/search/defaulttemplate.

Implementation

Implementation Remarks

A heuristic in ExcerptGeneratorImpl looks for fragments containing the search words in the page content n and the descendant nodes o.

The implementation uses the Composum Query mechanism that transparently supports versioned pages. To generate excerpts we search in one joined query both for the cpp:PageContent nodes that match all of the search words simultaneously and for the descendant nodes that match at least one of the search words:

While Jackrabbit can generate an excerpt, this is not usable for our purposes because it only collects the excerpts from the attributes of the found node, not from the aggregated subnodes. Thus, an excerpt can only be formed by searching those nodes.

SELECT n.[jcr:path], o.[jcr:path], ... FROM [cpp:PageContent] AS n LEFT OUTER JOIN [nt:base] AS o ON ISDESCENDANTNODE(o, n) WHERE ISDESCENDANTNODE(n, '/content') AND (CONTAINS(n.* , "searchword1 searchword2" ) ) AND (CONTAINS(o.* , 'searchword1 OR searchword2' ) ) ORDER BY n.[jcr:score] DESC

Setup

To use the search, such an aggregation can be configured by importing aggregates.json e.g. using the browser into /oak:index/lucene/aggregates, assuming the standard sling launchpad setup. It configures that descendants of cpp:PageContent up to a depth of 9 are aggregated, and the descendants of nt:frozenNode are also recursively aggregated up to depth 9.

Configuration of Lucene aggregation

Background: A search should find all pages that contain all the search words, even if they are distributed over several paragraphs in the page. These, however, are represented by different JCR-nodes, so a simple fulltext search would not find such pages. The lucene engine used for the fulltext search can however be configured such that the words found in descendant nodes are aggregated into the node. Thus, the repository needs to be configured such that all subnodes of nodes of primary type cpp:PageContent are aggregated into that node. (We search for cpp:PageContent instead of, e.g., cpp:Page since, when using versioning, the page content is versioned and appears as frozen node in the version storage). To support search in versioned content, the descendants of nt:frozenNodes are also aggregated recursively - unfortunately it is not possible to distinguish between frozen nodes of cpp:PageContent or other nodes.