Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.sai.msu.su/~megera/postgres/talks/pgcon-2011.pdf
Дата изменения: Sun May 22 07:56:52 2011
Дата индексирования: Tue Feb 5 20:43:00 2013
Кодировка:

Поисковые слова: релятивистское движение
SP-GiST ­ a new indexing framework for PostgreSQL
Space-partitioning trees in PostgreSQL
Oleg Bartunov, Teodor Sigaev Moscow University

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


PostgreSQL extensibility



,,The world's most advanced open source database" from www.postgresql.org

It is imperative that a user be able to construct new access methods to provide efficient access to instances of nontraditional base types
Michael Stonebraker, Jeff Anton, Michael Hirohama. Extendability in POSTGRES , IEEE Data Eng. Bull. 10 (2) pp.16-23, 1987



User data types are ,,first class citizens" Adding new extensions on-line without restarting database



Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


PostgreSQL extensibility



B-tree ­ limited set of comparison operators (<,>,=,<=,=>)
­

All buil-in data types Ltree, hstore, pg_trgm, full text search, intarray, PostGIS Many other extensions ...... Hstore, pg_trgm, full text search, intarray Many other extensions



GiST ­ Generalized Search Tree used in many extensions
­ ­



GIN ­ Generalized Inverted Index
­ ­



Why do we talk about new indexing framework ?
PGCon-2011, Ottawa, May 17-20, 2011

Oleg Bartunov, Teodor Sigaev


PostgreSQL extensibility



There are many interesting data structures not available
­

K-D-tree, Quadtree and many variants


CAD, GIS, multimedia Phone routing, ip routing, substring search

­

Tries, suffix tree and many variants




Common features:
­

Decompose space into disjoint partitions


Quadtree ­ 4 quadrants Suffix tree ­ 26 regions (for english alphabet)

­

Unbalanced trees
PGCon-2011, Ottawa, May 17-20, 2011

Oleg Bartunov, Teodor Sigaev


Quadtree



Centroid based

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


Suffix tree



Search time depends on query length only !

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


SP-GiST



GiST is inspired by R-tree and doesn't supports unbalanced trees So, we need new indexing framework for Spatial Partitioning trees:
­



Provide internal methods, which are common for whole class of space partitioning trees Provide API for implementation specific features of data type

­

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


SP-GiST



Big Problem ­ Space Partitioning trees are in-memory structures and not suitable for page-oriented storage Several approaches:
1. Adapt structure for disk storage ­ difficult and not generalized 2. Introduce non-page oriented storage in Postgres - No way ! 3. Add node clustering to utilize page space on disk and preserve locality (path nodes stored close)



Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


SP-GiST tuples
Inner Tuple Prefix (optional) Node: predicate, ItemPointer Node 2 ...

LeafTuple Predicate Heap pointer Pointer next leaf on the same page

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


SP-GiST (suffix tree)
Inner 1 http://www. s g

Inner 2 igaev.ru/ i ...

leaf ndex.html
Oleg Bartunov, Teodor Sigaev

leaf oogle.com/
PGCon-2011, Ottawa, May 17-20, 2011

...


SP-GiST (quadtree)
Inner 1 (x=0, y=0) Centroids Inner 2 (x=1, y=10) Q1 Q2 3 4 Q1 Q2 3 4

leaf (111,222)
Oleg Bartunov, Teodor Sigaev

leaf (x=12, y=-45)
PGCon-2011, Ottawa, May 17-20, 2011

...


SP-GiST ­ tuples and pages
Inner Page Inner Tuple 1 Inner Tuple 2

Leaf Page 1 Leaf tuple Leaf tuple Leaf tuple

Leaf Page 2 Leaf tuple Leaf tuple Leaf tuple

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


SP-GiST - interface
C onfi gFn() C hoos eFn() - retu rns 3 oid s of data types: pref ix, predi cate s of node and le af tu ple -a o cce p ne o Ma t Ad d Sp l ts c f ac ch n nod it i o t o e n nte ion de to ner nt of inne r node , ret urns : inne r tup le tupl e (pr efix s plit)

S plit Fn()

- make s inn er t uple from leaf p age

I nner Consi stent Fn() - acce pts c onte nt of inne r node and quer y, re turn s nod es to follo w L eafC onsis tentF n() - test leaf tup le fo r que ry

N otes : all func tions accep ts le vel and f ull i ndexed valu e

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


SP-GiST ChooseFn:Split

Insert: www.gogo.com Inner Tuple www.google.com/ a i

New inner Tuple 1 www. g o g o

New inner Tuple 2 gle.com/ Leaf tuple
Oleg Bartunov, Teodor Sigaev PGCon-2011, Ottawa, May 17-20, 2011

a

i


SP-GiST ­ insert algorithm
Sta rt wi th fir st tu ple o n ro ot l oop: if (p age i s le if ( enou in else ca cu end if els e swi tch ca a f) the n g h spac e) s ert l l spli tFn( ) an d resu me in sert from r rent p lace

b y choo seFn s e Matc hNod e ­ go b ag cas e AddN ode ­ add cas e Spli t ­ spli resu plac

y a n t m e

point in o de an inner e inse

er an d loop d ins ert tupl e and rt fr om cur rent

end if
Oleg Bartunov, Teodor Sigaev PGCon-2011, Ottawa, May 17-20, 2011


Quadtree implementation



Prefix and leaf predicate are points, node predicate is short number SplitFn() - just form a centroid and 4 nodes (quadrants) ChooseFn() - choose a quadrant (no AddNode, no split tuple) InnerConsistentFn() - choose quadrant(s) LeafConsistentFn ­ simple equality 179 lines of code











Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


Quadtree



Table geo (points) : 2045446 points from US geonames Size: 293363712

knn=# explain (analyze on, buffers on) select point from geo where point ~= '(34.34898,-92.82934)'; Abco (Arkansas,County of Hot Spring) Seq Scan on geo (cost=0.00..36626.31 rows=10228 width=16) (actual time=0.027..286.088 rows=1 loops=1) Filter: (point ~= '(34.34898,-92.82934)'::point) Buffers: shared hit=11057 Total runtime: 286.118 ms (4 rows) Time: 2 8 6 . 6 5 9 m s

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


Quadtree



Table geo (points) : 2045446 points from US geonames GiST
create index pt_gist_idx on geo using gist(point); INDEX 36672.283 ms 153,124,864



knn=# CREATE Time: Size:


SP-GiST

knn=# CREATE Time: Size:

create index pt_spgist_idx on geo using spgist(point); INDEX 12805.530 ms ~ 3 t i m e s f a s t e r ! 153,788,416 ~ the same size

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


Quadtree



GiST

knn=# explain (analyze on, buffers on) select point from geo where point ~= '(34.34898,-92.82934)'; Bitmap Heap Scan on geo (cost=456.26..11872.18 rows=10227 width=16) (actual time=0.188..0.188 rows=1 loops=1) Recheck Cond: (point ~= '(34.34898,-92.82934)'::point) Buffers: shared hit=12 -> Bitmap Index Scan on pt_gist_idx (cost=0.00..453.70 rows=10227 width=0) (actual time=0 . 1 7 9 . . 0 . 1 7 9 rows=1 loops=1) Index Cond: (point ~= '(34.34898,-92.82934)'::point) Buffers: shared hit=11 Total runtime: 0.235 ms

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


Quadtree



SP-GiST

knn=# explain (analyze on, buffers on) select point from geo where point ~= '(34.34898,-92.82934)'; Bitmap Heap Scan on geo (cost=576.50..11992.42 rows=10227 width=16) (actual time=0.041..0.041 rows=1 loops=1) Recheck Cond: (point ~= '(34.34898,-92.82934)'::point) Buffers: shared hit=6 -> Bitmap Index Scan on pt_spgist_idx (cost=0.00..573.94 rows=10227 width=0) (actual time=0 . 0 3 3 . . 0 . 0 3 3 rows=1 loops=1) Index Cond: (point ~= '(34.34898,-92.82934)'::point) Buffers: shared hit=5 Total runtime: 0.083 ms ~ 6 t i m e s f a s t e r !

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


Quadtree



Page space utilization

knn=# select spgstat('pt_spgist_idx'); spgstat -----------------------------totalPages: 18772 + innerPages: 803 + leafPages: 17969 + emptyPages: 32 + usedSpace: 64340.80 kbytes+ freeSpace: 85321.91 kbytes+ FillRatio: 42.99% + leafTuples: 2045446 + innerTuples: 5982 (1 row)

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


Quadtree



Conclusions
­

Index creation is fast (3 times faster than GiST) even in prototype. Current page utilization is ~ 40% ! Index size can be improved using better clustering technique Search is very fast (~ 3 times faster than GiST) for ~= operation. Need to implement other operations.

­

­

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


Suffix tree implementation



Prefix and leaf predicate are texts, node predicate is char (byte) Interface functions are quite complex because of prefix support Interface functions takes into account current level in tree 329 lines of code (not so much!)







Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


Suffix tree



4 mln urls from uk domain (10-20 url from each server) Btree (Size=396,730,368), create index ~ 19 sec



test=# explain (analyze on, buffers on) select * from t1 where t = 'http://0-2000webhosting.co.uk/super-submit.htm'; QUERY PLAN -------------------------------------------------------------------------------Index Scan using t1_bt_idx on t1 (cost=0.00..10.20 rows=1 width=72) (actual time=0 . 0 9 5 . . 0 . 0 9 6 rows=1 loops=1) Index Cond: (t = 'http://0-2000webhosting.co.uk/super-submit.htm'::text) Buffers: shared hit=6 Total runtime: 0.126 ms

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


Suffix tree



4 mln urls from uk domain (10-20 url from each server) SP-GiST (Size=1,797,554,176), create index ~ 28 sec



test=# explain (analyze on, buffers on) select * from t1 where t = 'http://0-2000webhosting.co.uk/super-submit.htm'; QUERY PLAN ---------------------------------------------------------------Bitmap Heap Scan on t1 (cost=13.03..17.05 rows=1 width=72) (actual time=0.030..0.030 rows=1 loops=1) Recheck Cond: (t = 'http://0-2000webhosting.co.uk/super-submi Buffers: shared hit=4 -> Bitmap Index Scan on t1_spg_idx (cost=0.00..13.03 rows=1 (actual time=0 . 0 2 1 . . 0 . 0 2 1 rows=1 loops=1) Index Cond: (t = 'http://0-2000webhosting.co.uk/super-s Buffers: shared hit=3 Total runtime: 0.075 ms ~ 4 t i m e s f a s t e r !

---------------t.htm'::text) width=0)

ubmit.htm'::text

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


Suffix tree



Page space utilization

test=# select spgstat('t1_spg_idx'); spgstat -------------------------------totalPages: 219427 + innerPages: 4965 + leafPages: 214462 + emptyPages: 0 + usedSpace: 228026.99 kbytes + freeSpace: 1521389.05 kbytes+ fillRatio: 13.03% + leafTuples: 4000000 + innerTuples: 44144 (1 row)

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


Suffix tree



Conclusions
­ ­

Index creation is slower than Btree ( 28 sec vs 19 sec) Current page utilization is ~ 13% ! Index size ~4 times bigger than Btree, can be Ѕ of Btree index if 100% utilization. Search is very fast (~ 4 times faster than Btree) for = operation. Need to implement other operations.

­

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


SP-GiST TODO
Improve page utilization (Clustering) Concurrency WAL Vacuum Spggettuple() Amcanorder Add operations K-d-tree? Btree emulation? Something else? KNN ? (amcanorderbyop)



















Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011


SP-GiST links



SP-GiST publications
­

http://www.cs.purdue.edu/spgist/



Downloads
http://www.sigaev.ru/misc/spgist-0.37.tgz

Oleg Bartunov, Teodor Sigaev

PGCon-2011, Ottawa, May 17-20, 2011