Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ
îðèãèíàëüíîãî äîêóìåíòà
: http://www.mrao.cam.ac.uk/~bn204/alma/alma-query-db-q-explain.html
Äàòà èçìåíåíèÿ: Mon Apr 4 13:47:51 2016 Äàòà èíäåêñèðîâàíèÿ: Sun Apr 10 09:33:22 2016 Êîäèðîâêà: IBM-866 Ïîèñêîâûå ñëîâà: corona |
See also
This is a moderately complicated query that demonstrates most of the features of the query language that a casual user is likely to want to use:
Here is the query:
SELECT TScan.timestamp, TScan.cont, TSource.sn, intent
FROM (SELECT archive_uid,
extract(aa.xml,
'//row[1]//scanIntent/text()').getStringVal() as intent,
extract(aa.xml,
'//row[1]//startTime/text()').getStringVal() as stime,
extract(aa.xml,
'//row[1]//endTime/text()').getStringVal() as etime,
extract(aa.xml,
'//ContainerEntity[1]/@entityId').getStringVal() as cont,
timestamp
FROM ALMA.xml_scantable_entities aa) Tscan,
(SELECT timestamp,
archive_uid,
extract(ab.xml,
'/SourceTable//sourceName/text()').getStringVal() as sn,
extract(ab.xml,
'//ContainerEntity[1]/@entityId').getStringVal() as cont
FROM ALMA.xml_sourcetable_entities ab) Tsource
WHERE TSource.timestamp > '20-jun-10' AND
TScan.timestamp > '20-jun-10' AND
Tscan.intent like '%%CALIBRATE_DELAY%%' AND
Tscan.intent like '%%CALIBRATE_PHASE%%' AND
(etime-stime) > 600e9 AND
Tscan.cont=Tsource.cont;
The first row of the query:
SELECT TScan.timestamp, TScan.cont, TSource.sn, intent
specifies which information we actually want to see from the tables
being analysed (documentation). Note
that when more than one table is being considered in the query has the
same column name, it is necessary to specify the name of the table as
well as the column name. This is done with the
TableName.ColumnName
syntax which is seen above for the timestamp,
cont and sn columns.
The next part of the query is the FROM
clause which specifies
which tables we are considering in this query. In this case these
source tables are not tables that already exist in the ALMA data model
but rather they are themselves generated by a sub-query. The first
of these sub-queries has the following form:
(SELECT archive_uid,
extract(aa.xml,
'//row[1]//scanIntent/text()').getStringVal() as intent,
extract(aa.xml,
'//row[1]//startTime/text()').getStringVal() as stime,
extract(aa.xml,
'//row[1]//endTime/text()').getStringVal() as etime,
extract(aa.xml,
'//ContainerEntity[1]/@entityId').getStringVal() as cont,
timestamp
FROM ALMA.xml_scantable_entities aa) Tscan,
The body of the query is in parenthesis which signifies that it is a sub-query. Following the parenthesis is the symbol “Tscan” – this means we have created a name for this subquery with this value. Within the parenthesis the sub-query has the same structure as the top-level query:
SELECT
statement specifies which information (“columns”) we
want to retrieveFROM
statement specifies from which table to extract
informationFROM
statement
we create another name for it, in this case we create a name aa
for the table ALMA.xml_scantable_entities
What makes this sub-query complicated is of course the extract
expressions. These expressions are used parse semi-structured
information contained in XML fields. The first argument is the column
(aa.xml
which in this case is ALMA.xml_scantable_entities.xml
)
to parse and the second is the XPath expression (explained later)
which specifies which information to extract. After the extract
statement there is a statement “as” and a name – this form gives
names to the columns created using the extract statements.
There are two sub-queries in this top-level query:
The last part of the query is the WHERE
section:
WHERE TSource.timestamp > '20-jun-10' AND
TScan.timestamp > '20-jun-10' AND
Tscan.intent like '%%CALIBRATE_DELAY%%' AND
Tscan.intent like '%%CALIBRATE_PHASE%%' AND
(etime-stime) > 600e9 AND
Tscan.cont=Tsource.cont;
This section has two purposes:
CALIBRATE_DELAY
and CALIBRATE_PHASE
ContainerElement
property in the XML fieldsIn the present case this is a very simple concept: the information we
require is in two tables (ALMA.xml_scantable_entities
and
ALMA.xml_sourcetable_entities
) and we want to join rows in these
two tables that belong to the same observation.
We implement this join by:
ContainerElement
elementWHERE
statement Tscan.cont=Tsource.cont
which
creates the joinBecause of incorrect design of the ALMA archive, even very simple queries require the use of XPath language – basically the underlying reason is that even well structured data are stored in semi-structured XML and not in the tables of the database.
The XPath query language is defined here: http://www.w3.org/TR/xpath/. It is a language for extracting a (re-ordered) sub-set of an XML document. The word document is used in the technical sense – in the case of ALMA the document is each XML field.
An XPath expression is applied to an XML field in the database using
the extract
function (doc)
in Oracle.
To write practical queries the following steps need to be done:
extract
function to retrieve the
information and for essentially a pseudo-column of the tableHere is an example of a dump from the “SCANTABLE” table:
"ARCHIVE_UID","TIMESTAMP","XML","SCHEMAUID","OWNER","DELETED","READPERMISSIONS","WRITEPERMISSIONS","HIDDEN","DIRTY","VIRTUAL"
"uid://X02/X11b/X1",05-DEC-09 23.36.53.435000000,<?xml version=""1.0"" encoding=""ISO-8859-1""?> <ScanTable xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns=""http://Alma/XASDM/ScanTable"" > <Entity entityId=""uid://X02/X11b/X1"" entityIdEncrypted=""na"" entityTypeName=""ScanTable"" schemaVersion=""1"" documentVersion=""1""/><ContainerEntity entityId=""uid://X02/X101/X1"" entityIdEncrypted=""na"" entityTypeName=""ASDM"" schemaVersion=""1"" documentVersion=""1""/><row> <scanNumber> 1 </scanNumber> <startTime> 4766772926114000000 </startTime> <endTime> 4766772973994000000 </endTime> <numIntent> 1 </numIntent> <numSubScan> 2 </numSubScan> <scanIntent> 1 1 CALIBRATE_POINTING</scanIntent><calDataType> 1 1 TOTAL_POWER</calDataType><calibrationOnLine> 1 1 true </calibrationOnLine> <calibrationFunction> 1 1 UNSPECIFIED</calibrationFunction><calibrationSet> 1 1 NONE</calibrationSet><calPattern> 1 1 CROSS_SCAN</calPattern><numField> 1 </numField> <fieldName> 1 1 ""Jupiter"" </fieldName> <sourceName> Jupiter </sourceName> <flagRow> false </flagRow> <execBlockId> ExecBlock_0 </execBlockId> </row> </ScanTable>,"uid://A002/X14/X5e","user",0,"","",0,0,0
And here is just the XML reformatted to be easier to view on-screen:
<?xml version="1.0" encoding="ISO-8859-1"?>
<ScanTable xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns="http://Alma/XASDM/ScanTable" >
<Entity entityId="uid://X02/X11b/X1" entityIdEncrypted="na"
entityTypeName="ScanTable" schemaVersion="1"
documentVersion="1"/>
<ContainerEntity entityId="uid://X02/X101/X1"
entityIdEncrypted="na" entityTypeName="ASDM"
schemaVersion="1"
documentVersion="1"/>
<row><scanNumber> 1 </scanNumber>
<startTime>4766772926114000000 </startTime>
<endTime>4766772973994000000 </endTime>
<numIntent> 1 </numIntent>
<numSubScan>2 </numSubScan>
<scanIntent> 1 1 CALIBRATE_POINTING</scanIntent>
<calDataType> 1 1 TOTAL_POWER</calDataType>
<calibrationOnLine> 1 1 true </calibrationOnLine>
<calibrationFunction> 1 1 UNSPECIFIED</calibrationFunction>
<calibrationSet> 1 1 NONE</calibrationSet>
<calPattern> 1 1 CROSS_SCAN</calPattern>
<numField> 1 </numField> <fieldName> 1 1 "Jupiter"</fieldName>
<sourceName> Jupiter </sourceName>
<flagRow>false </flagRow>
<execBlockId> ExecBlock_0 </execBlockId>
</row>
</ScanTable>
It is easy to identify from this print-out the important information:
The XPath expressions given below illustrate most typical use:
//row[1]//scanIntent/text()
means get the text of the
scanIntent
element of the first row
element in the field//ContainerEntity[1]/@entityId
means get the value of attribute
entityId
of the first ContainerEntity[1]
in the field