Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.sai.msu.su/~megera/postgres/talks/Gin-toronto-2006.pdf
Дата изменения: Sun Oct 21 18:54:04 2012
Дата индексирования: Tue Feb 5 21:13:18 2013
Кодировка:

Поисковые слова: galaxy pair
Generalized Inverted Index An inverted index is an index structure storing a set of (key, posting list) pairs, where 'posting list' is a set of documents in which the key occurs.


Generalized means that the index does not know which operation it accelerates. It works with custom strategies, defined for specific data types. GIN is similar to GiST and differs from B-Tree indices, which have predefined, comparison-based operations.


Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006


GIN Structure
Entry page, level 0 (leaf) aaa Pointer to posting tree: B-Tree over ItemPointer to heap

Entry page, level N: keywords abc bar foo

abc Posting list: sorted array of ItemPointer to heap

Entry page, level 0 baa bar

Posting page, level N: ItemPointer 14:17 218:1 1021:6

Right link

Posting page, level 0 (leaf) 1:33 2:7 14:17 Right bound 14:17

Posting page, level 0 (leaf) 123:1 158:18 Right bound 218:1

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006


GIN features



Concurrency
­

Lehman and Yao's high-concurrency B-tree management algorithm Recovery The scheme is similar to GiST



WAL
­



User-defined opclasses
­

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006


GIN Interface

Four interface functions (pseudocode):


Datum* extractValue(Datum inputValue, uint32* nentries) int compareEntry(Datum a, Datum b) Datum* extractQuery(Datum query, uint32* nentries, StrategyNumber n) bool consistent(bool check[], StrategyNumber n, Datum query)
PostgreSQL Summit, Toronto, July 8-9, 2006





Oleg Bartunov, Teodor Sigaev


GIN Interface: extractValue

Datum* extractValue(Datum inputValue, uint32* nentries) Returns an array of Datum of entries of the value to be indexed. nentries should contain the number of returned entries. Tsearch2 example: inputValue is tsvector, output is array of text type, containing lexemes.
Oleg Bartunov, Teodor Sigaev PostgreSQL Summit, Toronto, July 8-9, 2006


GIN Interface: compareEntry

int compareEntry(Datum a, Datum b) Compares two entries (not the indexing values), returns <0, 0, >0 Tsearch2 example: built-in bttextcmp(), used for built-in B-Tree index over texts.

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006


GIN Interface: extractQuery Datum* extractQuery(Datum query, uint32* nentries, StrategyNumber n) Returns an array of Datum of entries of the query to be executed. n is the strategy number of the operation. Depending on n, query can be different type. Tsearch2 example: query is tsquery, output is array of text type, containing lexemes.
Oleg Bartunov, Teodor Sigaev PostgreSQL Summit, Toronto, July 8-9, 2006


GIN Interface: consistent bool consistent(bool check[], StrategyNumber n, Datum query) Each element of the check array is true if the indexed value has a corresponding entry in the query: if (check[i] = TRUE) then the i-th entry of the query is present in the indexed value. The function should return true if the indexed value matches by StrategyNumber and the query.
Oleg Bartunov, Teodor Sigaev PostgreSQL Summit, Toronto, July 8-9, 2006


GIN Interface: consistent

Tsquery: fat & ( cat | dog ) Posting lists (logically) of ItemPointers 1:7 1:15 15:7 33:11 33:11 34:1
Oleg Bartunov, Teodor Sigaev

bool check[] T,F,F

Consistent function T&(F|F) = F T&(F|T) = T F&(T|F) = F T&(T|T) = T F&(T|T) = F

1:15 33:11 34:1

T,F,T F,T,F T,T,T F,T,T

PostgreSQL Summit, Toronto, July 8-9, 2006


GIN: create index flow
Tuple to index Value (tsvector): cat:2 fat:1 ItemPointer: 17:9

extractValue()

maintenance_work_mem cache: Sorted array of entries cat foo fat Arrays of ItemPointers 12:3, 14:5, 15:1, 17:9 1:1, 2:5, 15:3 2:3, 17:9
PostgreSQL Summit, Toronto, July 8-9, 2006

HDD

Oleg Bartunov, Teodor Sigaev


Gin opclasses



Built-in support for any one-dimensional array


&& - overlap @ - contains ~ - contained



Tsearch2 Intarray ­ enhanced support for int4[]

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006


GIN tips



GUC variable: gin_fuzzy_search_limit - soft upper limit on the returned results for very frequent words Create is much faster than inserts



Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006


GIN limitations



No support for multicolumn indices GIN doesn't uses scan->kill_prior_tuple & scan->ignore_killed_tuples GIN searches entries only by equality matching GIN doesn't support full scans of index GIN doesn't index NULL values





Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006


???



Two kinds of NULL


(NULL = NULL) is NULL ('{NULL}'::int[]='{NULL}') is TRUE '{{1,2},{3,4}}' @ '{2,3}' - ?



Multidimensional arrays: &&, @, ~ ?




Recent fillfactor patch ­ nested B-Tree

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006