Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/README.gendict
Дата изменения: Mon Jul 14 17:28:54 2003
Дата индексирования: Sat Dec 22 09:12:40 2007
Кодировка:

Поисковые слова: ic 4603
Gendict - generate dictionary templates for contrib/tsearch2 module.

This utility aims to help people creating dictionary for contrib/tsearch v2
module. Particularly, it has built-in support for snowball stemmers.

Programming API to tsearch2 dictionaries is described in tsearch v2
documentation.


Prerequisities:

* PostgreSQL 7.3 and above.

* You need tsearch2 module sources already compiled

* Rights to install contrib modules

Usage:

run config.sh without parameters to see options and arguments

Usage:
./config.sh -n DICTNAME ( [ -s [ -p PREFIX ] ] | [ -c CFILES ] [ -h HFILES ] [ -i ] ) [ -v ] [ -d DIR ] [ -C COMMENT ]
-v - be verbose
-d DIR - name of directory in PGSQL_SRC/contrib (default dict_DICTNAME)
-C COMMENT - dictionary comment
Generate Snowball stemmer:
./config.sh -n DICTNAME -s [ -p PREFIX ] [ -v ] [ -d DIR ] [ -C COMMENT ]
-s - generate Snowball wrapper
-p - prefix of Snowball's function, (default DICTNAME)
Generate template dictionary:
./config.sh -n DICTNAME [ -c CFILES ] [ -h HFILES ] [ -i ] [ -v ] [ -d DIR ] [ -C COMMENT ]
-c CFILES - source files, must be placed in contrib/tsearch2/gendict directory.
These files will be used in Makefile.
-h HFILES - header files, must be placed in contrib/tsearch2/gendict directory.
These files will be used in Makefile and subinclude.h
-i - dictionary has init method


Example 1:

Create Portuguese stemmer

0. cd PGSQL_SRC/contrib/tsearch2/gendict

1. Obtain stem.{c,h} files for Portuguese

wget http://snowball.tartarus.org/portuguese/stem.c
wget http://snowball.tartarus.org/portuguese/stem.h

2. Create template files for Portuguese

./config.sh -n pt -s -p portuguese -v -C'Snowball stemmer for Portuguese'

Note, that argument for -p option should be *the same* as name of stemming
function in stem.c (without _stem)

A bunch of files will be generated and placed in PGSQL_SRC/contrib/dict_pt
directory.

3. Compile and install dictionary

cd PGSQL_SRC/contrib/dict_pt
make
make install

4. Test it

Sample portuguese words with the stemmed forms are available
from http://snowball.tartarus.org/portuguese/stemmer.html

createdb testdict
psql testdict < /usr/local/pgsql/share/contrib/tsearch2.sql
psql testdict < /usr/local/pgsql/share/contrib/dict_pt.sql
psql -d testdict -c "select lexize('pt','bobagem');"
lexize
---------
{bobag}
(1 row)

Here is what I have in pg_ts_dict table

psql -d testdict -c "select * from pg_ts_dict where dict_name='pt';"
dict_name | dict_init | dict_initoption | dict_lexize | dict_comment
-----------+-----------+-----------------+-------------+---------------------------------
pt | 7177806 | | 7159330 | Snowball stemmer for Portuguese
(1 row)


Note, that you have already installed dictionary and corresponding
entry in tsearch configuration and you may modify it using
plain SQL commands, for example, specify stop words.

Example 2:

a) Simple template dictionary with init method

./config.sh -n wow -v -i -C WOW

b) Create simple template dict (without init method):
./config.sh -n wow -v -C WOW

The same as above, but dictionary will have not init method

Dictionaries obtained in a) and b) are fully working and ready
for use:
a) lowercase input word and remove it if it is a stop word
b) recognizes any word

c) Simple template dictionary with source files (with init method):

./config.sh -n wow -v -i -c a.c -h a.h -C WOW

Source files ( a.c ) must be placed in contrib/tsearch2/gendict directory.
These files will be used in Makefile.

Header files ( a.h ), must be placed in contrib/tsearch2/gendict directory.
These files will be used in Makefile and subinclude.h

d) Simple template dictionary with source files (without init method):

./config.sh -n wow -v -c a.c -h a.h -C WOW

The same as above, but dictionary will have not init method

After that you have sources in PGSQL_SRC/contrib/dict_wow and
you may edit them to create actual dictionary.

Please, check Tsearch2 home page (http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/)
for additional information about "Gendict tutorial" and dictionaries.