Full-Text Search in PostgreSQL: A Gentle Introduction
Prev	Fast Backward	Chapter 2. FTS Operators and Functions	Fast Forward	Next

2.5. Ranking

Ranking attempts to measure how relevant documents are to particular queries by inspecting the number of times each search word appears in the document, and whether different search terms occur near each other. Note that this information is only available in unstripped vectors -- ranking functions will only return a useful result for a tsvector which still has position information!

Notice, that ranking functions supplied are just an examples and doesn't belong to the FTS core, you can write your very own ranking function and/or combine additional factors to fit your specific interest.

The two ranking functions currently available are:

CREATE FUNCTION rank( [ weights float4[], ] vector TSVECTOR, query TSQUERY, [ normalization int4 ]) RETURNS float4

This is the ranking function from the old version of OpenFTS, and offers the ability to weight word instances more heavily depending on how you have classified them. The weights specify how heavily to weight each category of word:

{D-weight, C-weight, B-weight, A-weight}

If no weights are provided, then these defaults are used:

{0.1, 0.2, 0.4, 1.0}

Often weights are used to mark words from special areas of the document, like the title or an initial abstract, and make them more or less important than words in the document body.

CREATE FUNCTION rank_cd( [ weights float4[], ] vector TSVECTOR, query TSQUERY, [ normalization int4 ]) RETURNS float4

This function computes the cover density ranking for the given document vector and query, as described in Clarke, Cormack, and Tudhope's "Relevance Ranking for One to Three Term Queries" in the 1999 Information Processing and Management.

Both of these ranking functions take an integer normalization option that specifies whether a document's length should impact its rank. This is often desirable, since a hundred-word document with five instances of a search word is probably more relevant than a thousand-word document with five instances. The option can have the values, which could be combined using | (for example, 2|4) to take into account several factors:

0 (the default) ignores document length.
1 divides the rank by the 1 + logarithm of the document length
2 divides the rank by the length itself.
4 divides the rank by the mean harmonic distance between extents
8 divides the rank by the number of unique words in document
16 divides the rank by 1 + logarithm of the number of unique words in document

Prev	Home	Next
Parser functions	Up	Headline