Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.sai.msu.su/~megera/postgres/talks/RCDL2007.oleg.pdf
Дата изменения: Wed Jul 25 20:03:16 2007
Дата индексирования: Sat Dec 22 07:08:44 2007
Кодировка:

Поисковые слова: http www.astronet.ru
§
© .. .. . .. oleg@sai.msu.su teodor@sigaev.ru
, , . , , . , PostgreSQL, , , . . . . , -, . , , - . , , . - , - . , . , , .., , . , , , - , . , , "" , ,

1
, Web, , , , , . , , . , , . , [1]. XML [2] XML . , ""
9 « : , » - RCDL'2007, - , , 2007.


, , . , , , , , , . EAN13, UPC, ISBN (), ISMN (), ISSN, PostgreSQL contrib/isn. , . , , . [3], , [22]. . , SQL. SQL (SQL99, SQL2003) - , XML. , , , . 1.1 , , , . , , , , , , , , . (access methods, AM) . , , (template) , , ,

. , , , , , . , , , - PostgreSQL [4, 5]. ( ) PostgreSQL . PostgreSQL . : PostgreSQL , . PostgreSQL , . 1.2 PostgreSQL PostgreSQL - , API, . PostgreSQL ( ), . PostgreSQL contrib. PostgreSQL - , , , PostgreSQL, , BSD. , . . ( ), ltree: 1. cd contrib/ltree 2. make && make install && make installcheck


3. psql DBNAME < /path/to/pgsql/share/contrib/ltree.sq l DBNAME , , .

GIN GiST . 2.1 contrib/intarray , . , . - , . , . , . , , - , . , - , " " . - overlap, contains, contained . - 2,15,23 select * from documents where sections @> '{2,15,23}'; @> - contains, . 8.2 GIN overlap, contains, contained. 2.2 contrib/ltree , GiST B-Tree. , . , , , (parent_id,child_id), , , , ., , Joe Celko "Trees and Hierarchies in SQL for Smarties". ltree , ltree, , 'Top.Countries.Europe.Russia', , [12].

2
, , . . () , . "Bloom Filters" [6]. RD-Tree [7], GiST [8], . GiST , . , , (false hits). c , , , . PostgreSQL (lossy) . , ( ). , . GIN [9,10] , , , . - , . , , GiST . , GIN , GiST , . PostgreSQL . GIN GiST [20].


ltree . ltree. : a) 'Top.Science':
ltreetest=# select path from test where path <@ 'Top.Science'; path -----------------------------------Top.Science Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology b) 'Astronomy': ltreetest=# select path from test where path ~ '*.Astronomy.*'; path ----------------------------------------Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology Top.Collections.Pictures.Astronomy Top.Collections.Pictures.Astronomy.Stars Top.Collections.Pictures.Astronomy.Gala.. Top.Collections.Pictures.Astronomy.Astr..

, . , , . , , , (), , , , . . , . XML, PostgreSQL . hstore , . :
CREATE TABLE documents ( id integer PRIMARY KEY, author text, title text, pubdate timestamp, misc hstore );

2.3 btree_gist - contrib/btree_gist , PostgreSQL. . , . 2.4 - - contrib/hstore ( => ), Perl. hstore , , . , , , , NULL. , . "" . , , . ,

- ISBN , misc:
select misc->'isbn' as isbn from documents; isbn -----------------'5-229-000000-0'

2.5 - contrib/tsearch2 - PostgreSQL. online , . , , . Tsearch2 [13] , API . ispell ( ) [14] tsearch2 . tsearch2, SQL, . ,


, . : . UTF-8. , . . GiST GIN. GiST online , GIN . "" , PostgreSQL , .. , . PostgreSQL (8.3) [15]. 2.6 - contrib/pg_trgm , , - . pg_trgm , GIN GiST. -, . . : : =# select word, similarity(word, 'supyrnova') AS sml from apod_words where word % 'supyrnova' order by sml desc, word; word | sml -----------+---------supernova | 0.538462

, , , , , . ( PostgreSQL ), PostgreSQL BSD. SPEC [16] , PostgreSQL , . PostgreSQL , , , . . , , " ", " ", , , " " [17].

4.
4.1 PostgreSQL , [3], . , - PostgreSQL, [18]. [19] PostgreSQL, , B+-tree Rtree, , , , . 4.2 GiST (Generalized Search Tree) , , . [8] , GiST (Generalized Search Tree, ),

3



R-tree ( ). , AM , , . , AM, . , B+-tree , . GiST . GiST PostgreSQL , (multi-key) , (concurrency and recovery), , . GiST. GiST [21]. 4.3 GIN (Generalized Inverted Index) GiST, (GIN) - (, ), - , . , , , . - , , - . , , , . GIN - contains, overlap contained. , GIN , . GIN GiST B-tree , . , N , N , . GIN B-tree, , , B-tree . GIN , .

4.4 GiST GIN - (AXLEN), (MAXNUM) MAXVAL. , 10,100,1000,1000 . : Linux 2.6.19.1, Slackware, x86, Intel(R) Pentium(R) 4 CPU 2.40GHz, 2Gb RAM. PostgreSQL 8.2.4 : shared_buffers = 256MB work_mem = 8MB maintenance_work_mem = 64MB effective_cache_size = 256MB (ms). (. [11]): MAXLEN=500, MAXNUM=100000, MAXVAL=100000; : GIN - 146765.140 GiST - 12721.421 : #Updates GIN GiST 10 160.569 72.390 100 543.285 83.026 1000 3702.067 284.039 10000 32571.510 2969.296 : : explain analyze select * from tt where a @> '{2}'; GiST: Bitmap Heap Scan on tt (cost=29.27..405.87 rows=100 width=979) (actual time=68 .714..6311.757 rows=678 loops=1) Filter: (a @> '{2}'::integer[]) -> Bitmap Index Scan on gist_idx (cost=0.00..29.24 rows=100 width=0) (actual time=43.430..43.430 rows=10774 loops=1) Index Cond: (a @> '{2}'::integer[]) Total runtime: 6312.807 ms GIN: Bitmap Heap Scan on tt (cost=150.50..527.11 rows=100 width=982) (actual time=0


.610..11.277 rows=678 loops=1) Recheck Cond: (a @> '{2}'::integer[]) -> Bitmap Index Scan on gin_idx (cost=0.00..150.48 rows=100 width=0) (actual time=0.433..0.433 rows=678 loops=1) Index Cond: (a @> '{2}'::integer[]) Total runtime: 11.892 ms : 1. GiST GIN, 10 2. GIN 3. GIN GiST . , GiST , , . explain analyze , GiST 10774 , - 678 10774 , . GIN .


1] .. , " ", http://citforum.ru/database/articles/premanifest/ 2] Extensible Markup Language (XML) 1.1. W3C Recommendation, REC-xml11-20060816./W3C, 2006. http://www.w3.org/TR/2006/REC-xml1120060816/ 3] .. , "- : ?", http://www.citforum.ru/database/articles/ordbms10/ 4.shtml 4] PostgreSQL. http://www.postgresql.org 5] .., " PostgreSQL?" ,2005. http://www.sai.msu.su/~megera/postgres/talks/what _is_postgresql.html 6] Bloom Filter. http://en.wikipedia.org/wiki/Bloom_filter 7] J.M. Hellerstein and A. Pfeifer, "The RD-tree: an Index Structure for Sets", Technical Report No. 1252, 1994 8] Joseph M. Hellerstein, Jeffrey F. Naughton and Avi Pfeffer, "Generalized Search Trees for Database Systems", 1996, Proceedings of the 21th International Conference on Very Large Data Bases. http://db.cs.berkeley.edu/papers/vldb95-gist.pdf 9] GIN Readme. http://www.sai.msu.su/~megera/wiki/Gin

10] GIN Presentation. http://www.sigaev.ru/gin/Gin.pdf 11] GIN GiST . http://www.sai.msu.su/~megera/wiki/GinTestUpdat e 12] Ltree. http://www.sai.msu.su/~megera/postgres/gist/ltree/ 13] Tsearch2. http://www.sai.msu.su/~megera/postgres/gist/tsearc h/V2 14] Snowball stemmer. http://snowball.tartarus.org 15] .., .., " PostgreSQL". http://www.sai.msu.su/~megera/postgres/talks/fts_p gsql_intro.html 16] SPECjAppServer2004 Result, SPEC (2007-07-07). http://www.spec.org/jAppServer2004/results/res200 7q3/jAppServer2004-20070703-00073.html 17] , http://moikrug.ru/, , http://www.rabota.ru/ "", http://www.astronet.ru, " ", http://geo.web.ru, " ", http://nature.web.ru. 18] Lawrence A. Rowe, Michael Stonebraker, "The POSTGRES Data Model", Proceedings of the 13th International Conference on Very Large Data Bases, 1987. http://www.vldb.org/conf/1987/P083.PDF 19] Michael Stonebraker, "Inclusion of new types in relational database systems", In Proceedings of the 4th IEEE International Conference on Data Engineering, pp. 262-269, Washington, D.C., 1986 20] O. Bartunov, "Full-Text Search in PostgreSQL", presentation at PGDay conference, 2007, Prato, Italy. http://www.sai.msu.su/~megera/postgres/talks/ftspgday-2007.pdf 21] .., .., " PostgreSQL GiST", 2005. http://www.sai.msu.su/~megera/postgres/talks/gist_ tutorial.html 22] Michael Stonebraker and Greg Kemnitz, "The Postgres Next Generation Database Management System", 1991, Commun. ACM, v.34, 10, pp. 7892.

Specialized data types for digital libraries
O.S. Bartunov, T.G.Sigaev Complex modern informational systems require specialized data types, optimized for fast access and tasks of informational retrieval. Rapid changes in patterns of access to information require extensibility of database engine to allow experts in the data domain to develop custom data type, optimized for data domain. We describe several data types, developed for the opensource ORDBMS PostgreSQL, which facilitate


operations with sets, hierarchical data, semistructured data and full-text search. Also, we describe PostgreSQL infrastructure for developing extensions.


§§ ( 05-07-90225-a, 06-07-89182-).