Документ взят из кэша поисковой машины. Адрес оригинального документа : http://classic.chem.msu.su/gran/gamess/table_qdpt2.pdf
Дата изменения: Thu Nov 2 13:41:46 2006
Дата индексирования: Mon Oct 1 19:54:27 2012
Кодировка:
Table-driven implementation of the multi-reference perturbation theories at second order
Alexander A. Granovsky

Laboratory of Chemical Cybernetics, M.V. Lomonosov Moscow State University, Moscow, Russia
November 1, 2006


Canonical single-reference MP
MP2:
E
mp 2

1 = 4


ijab

|< ij || ab >|2 ijab

Parameters: N, Nocc (i,j...), Nvirt(a,b...) Integral transformation - N5 step Only minor overhead due to PT power series summation itself (N4 step)

MP3 and above:
Integral transformation Intermediate quantities individual terms of the As in the case of MP2, N4 for MP3) - N5 step (amplitudes entering into numerators of the PT series) calculations - N6 and above PT summation itself has better scaling (e.g.,


Multi-reference (MR) MBPT Multi-reference theories
Additional parameters:
Nact(p,q,r,s,...), Ndet (Ncsf), NHeff

More complex expressions both for energy correction itself and for computational costs
Third and higher orders of various formulations of the multi-reference (MR) MBPT
Calculation of various intermediates is the most computationally-demanding stage

Non-contracted and partially contracted MR-MBPT theories at second order
Most of the computational efforts are typically due to summation of the individual terms of the PT series themselves, especially in the case of large active spaces


MCQDPT2 example


Previous presentation goals
Remove inefficient divide operations from inner loops Construct cache-friendly algorithm


Cache-friendly code sample
S=


B

C B C

B


ijab

(ia | jb)[2(ia | jb) - ( ja | ib)] a + b - i - j + E B

Loop over i
Loop over j
Loop over a
· Calculate tb = (ia | jb)[2(ia | jb) - ( ja | ib)] · Loop over B
­ Calculate = a - ­ Special sum over b: ­ Accumulate

i - j + EB
W =W +
b



tb b +

· End loop over B

End loop over a

End loop over j

End loop over i


Present work goals
Further reduce computational costs Reduce algorithmic complexity


Formal analysis of the computational costs
Each particular type of terms results in computational costs of a generic form:

N

k occ

N

l virt

N

m+n act

N

csf

N

Heff

where k, l, m, n are some integers 0 For the particular example above, k=2, l=2, m=n=0


Reminder - initial code version
S=


B

C B C

B


ijab

(ia | jb)[2(ia | jb) - ( ja | ib)] a + b - i - j + E B

Loop over B
Loop over i
Loop over j
· Loop over a
­ Sum over b:

T =T +


b

(ia | jb)[2(ia | jb) - ( ja | ib)] a + b - i - j + EB

· End loop over a

End loop over j

End loop over i Accumulate S

End loop over B


Code modification
S (E
S=
B

)=
ijab


B

CB CB S (E

(ia | jb)[2(ia | jb) - ( ja | ib)] a + b - i - j + E B
B

)

Loop over B Loop over i Loop over j Loop over a · Sum over b:

T =T +


b

(ia | jb)[2(ia | jb) - ( ja | ib)] a + b - i - j + EB

End loop over a End loop over j End loop over i, accumulate End loop over B

S (E

B

)
(
B

Loop over B: accumulate S: S = S + C B C B S E

)


E E
B

B

expression


= EB - E
+

EB = E

core

n (B )
i
active orbitals

i

E =


B

C

2
B

E

B

Conclusion: E B vary over known limited range of energies defined only by the active orbitals energy differences and number of electrons in active space.


Next steps, key ideas
Let us approximate S (E driven interpolation
B

)

using table-

Introduce intermediate equally-spaced helper grid of , =1..Ngrid Calculate S ( ) , =1..Ngrid using the definition given above and previously described efficient approach Fill in interpolation tables Calculate contributions to S using interpolated values of S (EB ) S (EB , interpolation tables)


Resulting code
Loop over i
Loop over j
Loop over a
· Calculate tb = (ia | jb)[2(ia | jb) - ( ja | ib)] · Loop over
­ Calculate = a ­ Special sum over b: ­ Accumulate

- i - j +


S (

)

W =W +


b

tb b +

· End loop over

End loop over a

End loop over j

End loop over I Fill in interpolation tables Loop over B: accumulate S:

S = S + C B C

B

Interp (E

B

)


Formal analysis of the computational costs, new code
Original code:

costs = N
New code:

2 occ

N

2 virt

N

csf

N

Heff

costs = N

2 occ

N

2 virt

N

grid

+CN

csf

N

Heff


Formal analysis of the computational costs, generic case
Main result:

N

k occ

N

l virt

N

m+n act

N

csf

N

Heff

is replaced by:

N

k occ

N

l virt

N

m+n act

N

grid

+CN

n act

N

csf

N

Heff

n = 0 for zero-body, 2 for one-body, 4 for two-body, and 6 for three-body terms.


Conclusions
Ngrid does not depend on the number of CSF and is defined only by the desired precision and by the structure of the active space
Computational costs dependence on the number of CSF is now efficiently decoupled from the dependence on the number of orbitals Improved algorithmic complexity Computed energies are smooth functions of external parameters No more need to store transformed integrals, only interpolation tables need to be computed Much faster calculations for large systems!


Main problem
Q: How to interpolate S (EB ) which can have multiple singularities? A: Actually, we always use ISA (Intruder State Avoidance) or some other energy denominators shift technique to avoid singularities. In the case of ISA, denominators are transformed as follows: so that S (
E
B

) is infinitely smooth function

a a b b+ b


Singularity removal by use of the ISA technique


Practical example


Practical experience
Seven-point polynomial interpolation seems to be optimal Ngrid of ca. 200В400 (E of ca. 0.05 a.u.) seems to be enough to get cumulative absolute errors less than 10-8 a.u., which is for large problems in any case a way smaller than the round-off errors and errors introduced by the use of non-completely converged CASSCF orbitals.


Sample calculation

Retinal molecule cc-pVTZ basis set, 1465 cartesian/1298 spherical basis functions CAS(12/12), 226512 CSF NHeff=15


PC GAMESS, standard vs. tabledriven approach
January 2006 October 2006, pilot code (production code will be much faster) Intel Xeon Woodcrest 2.67 GHz No CSF selection, all 226512 are used 8650 minutes of CPU time for PT (5293 minutes with CSF selection) E=-989.7676871132

Intel Xeon Dempsey 3.2 GHz CSF selection, 24709 CSF selected 95546 minutes of CPU time for PT E=-989.7676871075


Note: Any implementation of the code either completely or partially based on the ideas given in this presentation is strongly prohibited for any programs which are not distributed in source form according to the terms of GNU GPL version 2.0 or above.


Thank you for your attention!