Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.sai.msu.su/~megera/postgres/gist/openfts/README
Дата изменения: Sun Aug 3 17:08:40 2003
Дата индексирования: Sat Dec 22 07:18:38 2007
Кодировка:

Поисковые слова: http www.astronet.ru db msg 1186753
The Crash-course to OpenFTS
---------------------------------

MOTIVATION:

Current document is devoted to novices whose interests are quick
installation, testing and playing around to get feeling . Assuming you
already have all prerequisities installed a whole process should takes
about 2 minutes.

OpenFTS is based on quite complex algorithms from Information Retrieval
and Database theory. It's intended to be flexible. OpenFTS Primer
describes installation, running, API and should be used to write
your own search applications.

After completing tests you're welcome to read README.INSIDE for comments
on the examples scripts.

PREREQUISITIES:

Postgresql 7.4 + contrib/tsearch2 module (7.3.X is also works)
OpenFTS v.0.35 - available from http://openfts.sourceforge.net, currently
could be downloaded from CVS only.
Perl modules: DBI, DBD-Pg, Time::HiRes - available from CPAN (http://www.cpan.org)
DBI - http://search.cpan.org/search?dist=DBI
DBD::Pg - http://search.cpan.org/search?dist=DBD-Pg
Time::HiRes - http://search.cpan.org/search?dist=Time-HiRes

Test collection of documents is available for download from
http://openfts.sourceforge.net/test-collections/apod-en.tar.gz
Download and install the collection somewhere:
cd /path/to/test-collection/
tar xzvf apod-en.tar.gz
Now you should have test documents in /path/to/test-collection/apod directory.

APOD stands for the Astronomy Picture of the Day
( http://antwrp.gsfc.nasa.gov/apod/ ). Authors have kindly
granted permission to use texts for testing and non-commercial purposes
in framework of OpenFTS project.

APOD collection is consists of 1757 articles (about 7 Mb) and ideally suited
for OpenFTS. Indexing tooks about 29 seconds on my IBM ThinkPad T21 notebook
( Linux, 2.4.17, 256 Mb RAM, 20 Gb IDE HD). Total number of lexems is
131310, while the number of unique lexemes is only 8,806
( using Porter's stemmer ).

Demo is available from http://xware.astronet.ru/db/apod.html

Make sure you have enough rights to create database.
Now you may note the time !

RUNNING:

1. createdb openfts

Create test database

2. psql openfts < /path/to/share/contrib/tsearch2.sql

Load functions. Usually, if you postgresql is installed in
/usr/local/pgsql directory, these sql files should be in
/usr/local/pgsql/share/contrib directory.

4. ./init.pl openfts drop

Drop previous openfts instance if any

5. ./init.pl openfts

Create openfts instances (tables) in database

6. find /path/to/test-collection/apod -type f | ./index.pl openfts

index APOD collection
Resulting database occupies about 21 Mb on my notebook.

7. ./search.pl -p openfts supernovae stars

Output should looks like a string with document identificators
separated by semicolon:

Found documents:118
573;1241;1419;828;879;1629;553;795;740;1533;....

8. ./search.pl -p openfts -h5 supernovae stars

Show text fragments of the first 5 matched files with hilighted
query terms.
( It's possible to specify offset and limit in form of
-h offset-limit, i.e., -h 5-10 )

9. Benchmarking.

./search.pl -p openfts -b 100 supernovae stars
Found documents:118, total time (100 runs):4.19, average time: 0.042 sec

(Keep in my mind these numbers are for my notebook, your mileage may vary)

--------------------------------------------------------------------

PS.

1) A list of unique lexems indexed with their frequencies could be obtained
using following command:

psql -d openfts -qt -c "select * from stat('select fts_index from txt')\
order by ndoc desc, nentry desc,word"

Total number of lexems:

psql -d openfts -qt -c "select count(*) from stat('select fts_index from txt')"

2) We use Porters stemming algorithm in this example, but I
highly recommend Snowball algorithm (http://snowball.tartarus.org).
You'll need to install our perl interface to snowball which could be
downloaded from http://openfts.sourceforge.net/contributions.shtml
Snowball stemmer is a high quality stemmer and available for many
languages.


--------------------------------------------------------------------
Sat Aug 2 23:08:10 MSD 2003
Comments to Oleg Bartunov