Документ взят из кэша поисковой машины. Адрес
оригинального документа
: http://www.sai.msu.su/~megera/postgres/fts/doc/fts-debug.html
Дата изменения: Unknown Дата индексирования: Sun Apr 13 07:45:04 2008 Кодировка: Поисковые слова: п п п п п п п п п п п п п п п п п п п п п п п п п п п п п п п п п п п р п р п р п |
Full-Text Search in PostgreSQL: A Gentle Introduction | ||||
---|---|---|---|---|
Prev | Fast Backward | Chapter 2. FTS Operators and Functions | Fast Forward | Next |
Function ts_debug
allows easy testing your full-text
configuration.
ts_debug( [cfgname | oid ],document TEXT) RETURNS SETOF tsdebug
It displays information about every token from document as they produced by a parser and processed by dictionaries as it was defined in configuration, specified by cfgname or oid.
tsdebug type defined as
CREATE TYPE tsdebug AS ( "Alias" text, "Description" text, "Token" text, "Dicts list" text[], "Lexized token" text
For demonstration of how function ts_debug
works we
first create public.english configuration and
ispell dictionary for english language. You may skip test step and
play with standard english configuration.
CREATE FULLTEXT CONFIGURATION public.english LIKE pg_catalog.english WITH MAP AS DEFAULT; CREATE FULLTEXT DICTIONARY en_ispell OPTION 'DictFile="/usr/local/share/dicts/ispell/english-utf8.dict", AffFile="/usr/local/share/dicts/ispell/english-utf8.aff", StopFile="/usr/local/share/dicts/english.stop"' LIKE ispell_template; ALTER FULLTEXT MAPPING ON public.english FOR lword WITH en_ispell,en_stem;
=# select * from ts_debug('public.english','The Brightest supernovaes'); Alias | Description | Token | Dicts list | Lexized token -------+---------------+-------------+---------------------------------------+--------------------------------- lword | Latin word | The | {public.en_ispell,pg_catalog.en_stem} | public.en_ispell: {} blank | Space symbols | | | lword | Latin word | Brightest | {public.en_ispell,pg_catalog.en_stem} | public.en_ispell: {bright} blank | Space symbols | | | lword | Latin word | supernovaes | {public.en_ispell,pg_catalog.en_stem} | pg_catalog.en_stem: {supernova} (5 rows)
In this example, the word 'Brightest' was recognized by a parser as a Latin word (alias lword) and came through a dictionaries public.en_ispell,pg_catalog.en_stem. It was recognized by public.en_ispell, which reduced it to the noun bright. Word supernovaes is unknown for public.en_ispell dictionary, so it was passed to the next dictionary, and, fortunately, was recognized (in fact, public.en_stem is a stemming dictionary and recognizes everything, that is why it placed at the end the dictionary stack).
The word The was recognized by public.en_ispell dictionary as a stop-word (Section 1.3.6) and will not indexed.
You can always explicitly specify what columns you want to see
=# select "Alias", "Token", "Lexized token" from ts_debug('public.english','The Brightest supernovaes'); Alias | Token | Lexized token -------+-------------+--------------------------------- lword | The | public.en_ispell: {} blank | | lword | Brightest | public.en_ispell: {bright} blank | | lword | supernovaes | pg_catalog.en_stem: {supernova} (5 rows)