Table of Contents
GT4/Condor/WSS integration notes
This page contains notes on the GT4/Condor integration. Tested with Scientific Linux 3.0.5.
RPMs
Install the RPMs
echo rpm http://lhc.sinp.msu.ru/dist LCG-2_6_0 lcg_sl3 lcg_sl3.updates > /etc/apt/sources.list.d/lcg26.list echo rpm http://grid-deployment.web.cern.ch/grid-deployment/gis apt/LCG_CA/en/i386 lcg > /etc/apt/sources.list.d/lcg-ca.list apt-get update apt-get install gcc rh-postgresql-jdbc rh-postgresql-server ca_Russia ca_RDIG
Java
There are two alternative ways.
Use Sun Java RPMS
Install the RPMS:
apt-get j2sdk ant
Create the required links:
cd /usr/java/j2sdk1.4.2_08/man/man1 gzip -9 jar.1 cd /etc/alternatives ln -fs /usr/java/j2sdk1.4.2_08/man/man1/jar.1.gz ln -fs /usr/java/j2sdk1.4.2_08/bin/jar ln -sf /usr/java/j2sdk1.4.2_08/bin/javac ln -sf /usr/java/j2sdk1.4.2_08/bin/java
Use the jpackage.org RPMS
Add these sources to the apt/sources.list.d/jpackage.list:
rpm http://sunsite.informatik.rwth-aachen.de/ftp/pub/Linux/jpackage 1.6/redhat-el-3.0 devel free rpm http://sunsite.informatik.rwth-aachen.de/ftp/pub/Linux/jpackage 1.6/generic devel free non-free
Install java:
apt-get update apt-get install java-1.5.0-sun ant
Users and groups
Users and groups for the GT4 toolkit
groupadd -g 110 globus useradd -u 110 -g 110 -s /bin/bash -c "Globus toolkit user" globus
Test user to be mapped
useradd -G gridmapped testuser
SSH Host-based authentication
Create the /etc/ssh/ssh_known_hosts file. You can use the script like this:
rm -f /etc/ssh/ssh_known_hosts for host in lcg05 lcg07 lcg09 ; do echo $host,`host $host | cut -d ' ' --output-delimiter=, -f 1,4` ssh-dss `ssh-keyscan -t dsa $host 2>/dev/null | cut -d ' ' -f 3` >> /etc/ssh/ssh_known_hosts echo $host,`host $host | cut -d ' ' --output-delimiter=, -f 1,4` ssh-rsa `ssh-keyscan -t rsa $host 2>/dev/null | cut -d ' ' -f 3` >> /etc/ssh/ssh_known_hosts done
Create the /etc/ssh/shosts.equiv file.
cat > /etc/ssh/shosts.equiv <<EOF lcg05.sinp.msu.ru lcg07.sinp.msu.ru lcg09.sinp.msu.ru EOF
Enable HostbasedAuthentication for the sshd. Edit the file /etc/ssh/sshd_config and add ò??HostbasedAuthentication yesò??. You may use this script with the ssh from SL 3.0.5:
cd /etc/ssh patch -l -p0 <<EOF --- sshd_config.orig 2005-09-13 11:10:01.000000000 +0400 +++ sshd_config 2005-09-13 11:10:08.000000000 +0400 @@ -48,7 +48,7 @@ # For this to work you will also need host keys in /etc/ssh/ssh_known_hosts #RhostsRSAAuthentication no # similar for protocol version 2 -#HostbasedAuthentication no +HostbasedAuthentication yes # Change to yes if you don't trust ~/.ssh/known_hosts for # RhostsRSAAuthentication and HostbasedAuthentication #IgnoreUserKnownHosts no EOF
Make the HostbasedAuthentication to be default for the ssh clients. Edit /etc/ssh/ssh_config and add
HostbasedAuthentication yes EnableSSHKeysign yes
to the ò??Host *ò?? section. You may use this script with the ssh from SL 3.0.5:
cd /etc/ssh patch -l -p0 <<EOF --- ssh_config.orig 2005-09-13 11:18:31.000000000 +0400 +++ ssh_config 2005-09-13 11:21:52.000000000 +0400 @@ -36,3 +36,5 @@ # EscapeChar ~ Host * ForwardX11 yes + HostbasedAuthentication yes + EnableSSHKeysign yes EOF
Restart the sshd:
service sshd restart
Torque (PBS)
First configure ssh hostbased authentication, see the appropriate section of this document.
Server node
Install the torque on the server node.
tar xvfz torque-1.2.0p6.tar.gz cd torque-1.2.0p6 ./configure --disable-mom --disable-gui --set-server-home=/usr/local/spool/PBS --enable-syslog --with-scp make make install ./torque.setup root qterm -t quick
Configure the server_name:
cd /usr/local/spool/PBS echo lcg09.sinp.msu.ru > server_name
Add the list of the nodes to the nodes file:
cd /usr/local/spool/PBS cat > server_priv/nodes <<EOF lcg05.sinp.msu.ru np=2 EOF
Configure the nodes and then start the server and scheduler:
pbs_server pbs_sched
Worker nodes
Install the torque on the worker nodes.
tar xvfz torque-1.2.0p6.tar.gz cd torque-1.2.0p6 ./configure --disable-server --disable-gui --set-server-home=/usr/local/spool/PBS --enable-syslog --with-sc make make install
Configure the pbs_mom:
cd /usr/local/spool/PBS echo lcg09.sinp.msu.ru > server_name cat > mom_priv/config <<EOF \$clienthost 213.131.5.9 \$logevent 255 \$restricted 213.131.5.9 EOF
Start the pbs_mom:
pbs_mom
Shared-filesystem configuration
This is an optional configuration of the PBS with the shared homes. Install and configure the YP server:
apt-get install ypserv echo NISDOMAIN=gt4farm >> /etc/sysconfig/network chkconfig --level 345 ypserv on /etc/init.d/ypserv start make -C /var/yp
Configure and start the NFS server:
echo "/home lcg*.sinp.msu.ru(rw,no_root_squash,sync)" >> /etc/exports chkconfig --level 345 nfs on /etc/init.d/nfs start
The server is now ready, and we should configure the nodes.
apt-get install ypbind echo NISDOMAIN=gt4farm >> /etc/sysconfig/network chkconfig --level 345 ypbind on /etc/init.d/ypbind start echo "lcg09.sinp.msu.ru:/home /home nfs defaults 0 0" >> /etc/fstab mount /home
GT4 Installation and configuration
Directories
mkdir /usr/local/globus-4.0.1 chown globus:globus /usr/local/globus-4.0.1
GT4 installation
Become a globus user. If you use Sun Java do
export JAVA_HOME=/usr/java/j2sdk1.4.2_08 export JAVAC_PATH=/usr/java/j2sdk1.4.2_08/bin/javac
Extract the installation tarball and cd into the install directory, setup the environment:
tar xfj gt4.0.1-all-source-installer.tar.bz2 cd gt4.0.1-all-source-installer export GLOBUS_LOCATION=/usr/local/globus-4.0.1
With PBS do
export PBS_HOME=/usr/local/spool/PBS ./configure --prefix=$GLOBUS_LOCATION --enable-wsgram-pbs make 2>&1 | tee build.log make install
Without PBS do
./configure --prefix=$GLOBUS_LOCATION make 2>&1 | tee build.log make install
Certificates
Obtain the certificates for the host and place them into the /etc/grid-security/hostcert.pem and hostkey.pem. Create a copy of certs for the GT4 container and set correct permissions.
chmod 400 hostkey.pem chmod 644 hostcert.pem cp hostcert.pem containercert.pem cp hostkey.pem containerkey.pem chown globus:globus container*.pem
Environment
Create two shell profile scripts, uncomment lines with java if you used Sun rpms.
/etc/profile.d/globus.sh:
#export JAVA_HOME=/usr/java/j2sdk1.4.2_08 #export JAVAC_PATH=/usr/java/j2sdk1.4.2_08/bin/javac export PBS_HOME=/usr/local/spool/PBS export GLOBUS_LOCATION=/usr/local/globus-4.0.1 . $GLOBUS_LOCATION/etc/globus-user-env.sh
/etc/profile.d/globus.csh:
#setenv JAVA_HOME /usr/java/j2sdk1.4.2_08 #setenv JAVAC_PATH /usr/java/j2sdk1.4.2_08/bin/javac setenv PBS_HOME /usr/local/spool/PBS setenv GLOBUS_LOCATION /usr/local/globus-4.0.1 source $GLOBUS_LOCATION/etc/globus-user-env.csh
Configure GridFTP
Create entries in /etc/services:
gridftp 2811/tcp gridftp 2811/udp
Create config /etc/grid-security/gridftp.conf:
port 2811 allow_anonymous 0 inetd 1
Create xinetd service config /etc/xinetd.d/gridftp:
service gridftp { instances = 100 socket_type = stream wait = no user = root env += GLOBUS_LOCATION=/usr/local/globus-4.0.1 env += LD_LIBRARY_PATH=/usr/local/globus-4.0.1/lib server = /usr/local/globus-4.0.1/sbin/globus-gridftp-server server_args = -i log_on_success += DURATION nice = 10 disable = no }
Reload the xinetd service:
service xinetd reload
Configure PostgreSQL
Start postgresql and turn it on for autostart:
chkconfig --level 345 rhdb on service rhdb start
Edit /var/lib/pgsql/data/pg_hba.conf, add lines:
host all all 127.0.0.1 255.255.255.255 password host all all 213.131.5.7 255.255.255.255 password
Edit /var/lib/pgsql/data/postgresql.conf, uncomment lines:
tcpip_socket = true
Restart the posgresql service:
service rhdb restart
Fill the RFT databases and create postgres globus user, use password athdavRi when prompted:
su - postgres -c "createuser -A -D -P -E globus" su - postgres -c "createdb -O globus rftDatabase" su - postgres -c "psql -U globus -h localhost -d rftDatabase -f $GLOBUS_LOCATION/share/globus_wsrf_rft/rft_schema.sql"
Configure RFT
Edit the file $GLOBUS_LOCATION/etc/globus_wsrf_rft/jndi-config.xml and change the posgres database password to one used in postgres configuration step.
Configure GRAM
Create the group for the gridmapped users:
groupadd -g 31337 gridmapped
Edit /etc/group and add all the users which are grid-mapped to this group. Configure the sudo, run visudo and add these lines to the end:
# Globus GRAM entries globus ALL=(%gridmapped) NOPASSWD: /usr/local/globus-4.0.1/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /usr/local/globus-4.0.1/libexec/globus-job-manager-script.pl * globus ALL=(%gridmapped) NOPASSWD: /usr/local/globus-4.0.1/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /usr/local/globus-4.0.1/libexec/globus-gram-local-proxy-tool *
Setup the PBS jobmanager to use ssh, if configured with PBS support:
cd $GLOBUS_LOCATION/setup/globus ./setup-globus-job-manager-pbs --remote-shell=ssh
Don't forget to configure host-based authentication for SSH as described in the Torque/PBS section.
grid-mapfile entries
Add new entries to the grid-mapfile:
grid-mapfile-add-entry -dn "/C=RU/O=DataGrid/OU=sinp.msu.ru/CN=Lev Shamardin" -ln testuser
Start the container
During debug it is recommended to start the container like this:
touch /var/log/globus-container.log chown globus:globus /var/log/globus-container.log /usr/local/globus-4.0.1/bin/globus-start-container > /var/log/globus-container.log 2>&1 &
Condor installation and configuration
Install the Condor rpm:
rpm -ivh condor-6.7.10-linux-x86-glibc23-dynamic-1.i386.rpm
Create the condor user and reconfigure condor with the new user:
useradd -u 120 -g users -c "Condor user" condor cd /opt/condor-6.7.10 ./condor_configure --install-dir=/opt/condor-6.7.10 --owner=condor --type=submit,execute,manager --verbose
Edit /opt/condor-6.7.10/etc/condor_config if required.
Create the condor shell profile scripts.
/etc/profile.d/condor.sh:
export CONDOR_CONFIG=/opt/condor-6.7.10/etc/condor_config if [ `id -u` = 0 ] ; then export PATH="$PATH:/opt/condor-6.7.10/bin:/opt/condor-6.7.10/sbin" else export PATH="$PATH:/opt/condor-6.7.10/bin" fi
/etc/profile.d/condor.csh
setenv CONDOR_CONFIG "/opt/condor-6.7.10/etc/condor_config" if ( `id -u` == 0 ) then set path = ( $path /opt/condor-6.7.10/bin /opt/condor-6.7.10/sbin ) else set path = ( $path /opt/condor-6.7.10/bin ) endif
Create the scratch directories: on each execution host in user's home run
mkdir -p $HOME/.globus/scratch
LCMAPS
Create the pooled users accounts and populate the gridmapdir:
wget http://www-unix.mcs.anl.gov/~tfreeman/local/pooled/admin/addpoolusers.sh patch -p0 -l <<EOF --- addpoolusers.sh.orig 2005-09-22 22:37:00.000000000 +0400 +++ addpoolusers.sh 2005-09-22 22:37:29.000000000 +0400 @@ -8,11 +8,11 @@ # Andrew McNab <mcnab@hep.man.ac.uk> March 2001 # -startUID=9000 # start UID of first user - endUID=9010 # UID of last user - no more than startUID+999 +startUID=2000 # start UID of first user + endUID=2010 # UID of last user - no more than startUID+999 group=users # group to assign all pool users to - prefix=gpool # prefix, eg gpool000, gpool001, ... -homedirs=/home/gpool # where to make the home directories + prefix=mapped # prefix, eg gpool000, gpool001, ... +homedirs=/home/mapped # where to make the home directories ########## You dont need to edit anything below this line ######### ########## but you should make sure you understand it before ######### EOF mkdir /home/mapped mkdir -p /etc/grid-security/gridmapdir sh addpoolusers.sh
Install the LCMAPS build environment:
apt-get install cvs automake autoconf libtool bison flex openldap-devel
Install the latest LCMAPS version from the CVS to /opt/lcmaps:
cd /var/tmp mkdir egee && cd egee export CVSROOT=":pserver:anonymous@jra1mw.cvs.cern.ch:/cvs/jra1mw" cvs co org.glite org.glite.security cvs co -r glite-security-lcmaps-1_3_1-multiple-accounts org.glite.security.lcmaps \ org.glite.security.lcmaps-interface org.glite.security.lcmaps-plugins-basic \ org.glite.security.lcmaps-plugins-voms for i in lcmaps lcmaps-interface lcmaps-plugins-basic lcmaps-plugins-voms ; do \ cp org.glite.security/project/*.m4 org.glite.security.$i/project; \ cp org.glite/project/*m4 org.glite.security.$i/project; \ done export LSTAGEDIR=/opt/lcmaps for i in lcmaps lcmaps-interface lcmaps-plugins-basic lcmaps-plugins-voms ; do (cd org.glite.security.$i; make distclean; ./bootstrap; ./configure --prefix=$LSTAGEDIR --without-gsi-mode; make install); done
Workspace Service
(Does not work yet)
Download and deploy the Workspace service. Before deploying edit the $WORKSPACE_HOME/service/java/source/deploy-jndi-config.xml and set the path to the lcmaps conf file to /etc/grid-security/lcmaps-wss.conf. You must deploy the service as a globus user whith the globus container not running:
wget http://www-unix.mcs.anl.gov/workspace/workspaceService_tech_preview_4_1.tgz tar xvzf workspaceService_tech_preview_4.tgz cd workspaceService export WORKSPACE_HOME=`pwd` wget http://www.mcs.anl.gov/workspace/glite-security-util-java.jar mv glite-security-util-java.jar $WORKSPACE_HOME/service/java/source/lib source $GLOBUS_LOCATION/etc/globus-devel-env.sh cd $WORKSPACE_HOME vi $WORKSPACE_HOME/service/java/source/deploy-jndi-config.xml ant deploy
Add this line to the $GLOBUS_LOCATION/container-log4j.properties:
log4j.category.org.globus.workpsace=INFO
LCMAPS backend
First build and install LCMAPS libraries. After that cd to $WORKSPACE_HOME/local/lcmaps/source and edit the Makefile, change the paths to the LCMAPS includes and libraries. In our case:
patch -p0 -l <<EOF --- Makefile.orig 2005-09-23 12:58:50.000000000 +0400 +++ Makefile 2005-09-23 14:27:11.000000000 +0400 @@ -1,7 +1,7 @@ MYCFLAGS = -g -Wall -O1 EXEC = lcmaps_poolindex -MYINCS = -I/opt/glite/include/glite/security/lcmaps_without_gsi/ -MYLIBDIRS = -L/opt/glite/lib +MYINCS = -I/opt/lcmaps/include/glite/security/lcmaps_without_gsi +MYLIBDIRS = -L/opt/lcmaps/lib EOF
Build and install the lcmaps_poolindex binary:
make install
As root add path to lcmaps libs (/opt/lcmaps/lib) to the ld.so.conf and regenrate the ld cache:
echo "/opt/lcmaps/lib" >> /etc/ld.so.conf ldconfig
Install the lcmaps config file with the correct permissions (do it as root):
install -o globus -g globus -m 0644 $WORKSPACE_HOME/local/lcmaps/source/lcmaps_poolindex.conf /etc/grid-security/lcmaps-wss.conf
edit the /etc/grid-security/lcmaps-wss.conf, you should change at least LCMAPS_LOG_FILE and LCMAPS_DB_FILE. We recommend to place the LCMAPS_DB_FILE to /etc/grid-security/lcmaps/lcmaps.db.without_gsi, you should create this file and directory of course:
install -o root -g root -m 0755 -d /etc/grid-security/lcmaps
Example lcmaps.db.without_gsi (must be owned by root):
# LCMAPS policy file/plugin definition # # default path path = /opt/lcmaps/lib/modules # Plugin definitions: good = "lcmaps_dummy_good.mod" bad = "lcmaps_dummy_bad.mod" localaccount = "lcmaps_localaccount.mod -gridmapfile /etc/grid-security/grid-mapfile" vomslocalgroup = "lcmaps_voms_localgroup_without_gsi.mod -groupmapfile /etc/grid-security/groupmapfile -mapmin 0" vomspoolaccount = "lcmaps_voms_poolaccount_without_gsi.mod -gridmapfile /etc/grid-security/grid-mapfile -gridmapdir /etc/grid-security/gridmapdir -do_not_use_secondary_gids" posixenf = "lcmaps_posix_enf.mod -maxuid 1 -maxpgid 1 -maxsgid 32 " poolaccount = "lcmaps_poolaccount.mod -gridmapfile /etc/grid-security/grid-mapfile -gridmapdir /etc/grid-security/gridmapdir/ -override_inconsistency" # Policies: das_voms: localaccount -> good | poolaccount poolaccount -> good
Seems that the lcmaps_poolindex executable actually has to be setuid, at least we didn't manage to make it work from the globus account without being setuid. So, set the right permissions on the lcmaps_poolindex binary:
chmod 04711 $GLOBUS_LOCATION/bin/lcmaps_poolindex chown root:root /usr/local/globus-4.0.1/bin/lcmaps_poolindex
Workspace authorization
Edit (or create) the file $GLOBUS_LOCATION/etc/workspace_service/dn-authz, add your users to this file.
We are not going to rebuild the GRAM without gridmapfile support, since this is a test setup. So just edit these files and change the <authz value=ò??gridmapò??/> to the appropriate value:
- $GLOBUS_LOCATION/etc/globus_delegation_service/factory-security-config.xml
- $GLOBUS_LOCATION/etc/globus_wsrf_rft/factory-security-config.xml
- $GLOBUS_LOCATION/etc/gram-service/managed-job-factory-security-config.xml
<authz value="gram:org.globus.workspace.QueryPDP"/>
Known issues
- Security issue: quote from the GT4 admin guide: ò??WSRF-based components ignore the signing policy file and will honor all valid certificates issued by trusted CAsò??.
- Condor gt4 adapter assumes that there is a directory $HOME/.globus/scratch for the user under which permissions the job is executed, but never tries to create it. It should be created manually.
- GT4 seems to be unstable with Torque, the jobs can hang in the ò??unsubmittedò?? state (see this thread for more details). There seems to be a solution to this problem, but it is not well proven to work (see this message).
Condor with GT4 job submission
Since I have no access to condor sources all of the following is just my own deductions and speculations.
Condor-C GT4 gahp helper assumes that:
- The gridftp server is run on both Condor-C submission host and GT4 gatekeeper host.
- The user's certificate on the Condor-C submission host is mapped to the submitting user.
- The user's certificate on the GT4 host may be mapped to any user.
These assumptions mentioned above lead to the impossibility to submit a job to the GT4 from the Condor running on the same host if the user submitting the job is mapped to another user in the grid-mapfile. However there are (?) some workarounds that will be described in the ò??Dirty tricksò?? section.