Документ взят из кэша поисковой машины. Адрес
оригинального документа
: http://www.adass.org/adass/proceedings/adass03/P1-13/
Дата изменения: Mon Aug 16 22:35:56 2004 Дата индексирования: Tue Oct 2 05:16:47 2012 Кодировка: Поисковые слова: m 8 |
VizieR catalogues are divided into two categories: standard and large catalogues, where large catalogues are defined, somewhat arbitrarily, as having more than rows. Catalogues with up to a few million records are managed by a standard relational DBMS, while each of the larger catalogs has a dedicated query program which retrieves the records corresponding to a some circular or rectangular region around a position in the sky. Some details about the methods used to store the large catalogues and their performances, in terms of speed and disk usage, are given in Derriere et al. (2000); the current list of these large catalogues is given in Fig.1. It should be noted that both ``standard'' and ``large'' catalogues share the same metadata descriptions -- the VizieR interface simply translates the user's requests either into SQL queries, or into some customized set of parameters interpretated by the dedicated query program.
As the Sun server is becoming overloaded we decided to move the set of large catalogues to a Linux cluster (the CoCat cluster). It then becomes easy to increase the computing power or the storage capability at a very low cost; it represents also a flexible solution for the future evolutions.
A wide range of free or commercial clustering tools is available. We started with a new free clustering tool package, CLIC (Cluster LInux pour le Calcul) which makes use of the MPI library (Message Passing Interface) and is based on the Mandrake Linux 9.0 distribution. The CoCat cluster involves one master node and five slave nodes (Fig 2).
Tools like MPI are designed to run parallelized CPU-intensive tasks on a cluster, but in the CoCat case it is necessary to dispatch a large number of queries (typically - daily requests) and their results. The large catalogues being stored in a compact form, it was possible in a first step to replicate the data (about 200Gbytes) on each node. With the increasing number of increasingly larger catalogues it will be necessary in the near future to distribute the data over several nodes, and it will become mandatory to describe on which engines which part of which catalogue can be accessed: this role is devoted to the Dispatcher, running on the master node, and illustrated in Fig. 3.
We are currently testing new configurations for a more performant Dispatcher, where each node is considered as an independent resource and where the Dispatcher assigns the tasks according to its knowledge of the current load on each node. Such a method seems to work well in the current situation where all catalogues are present on each node, but in a near future we will have to take some important decisions about:
Derriere, S., Ochsenbein, F., & Egret, D. 2000, in ASP Conf. Ser., Vol. 216, Astronomical Data Analysis Software and Systems IX, ed. N. Manset, C. Veillet, & D. Crabtree (San Francisco: ASP), 235
Ochsenbein, F., Bauer, P., & Marcout, J. 2000, A&AS, 143, 23