Документ взят из кэша поисковой машины. Адрес оригинального документа : http://uneex.lorien.cs.msu.su/static/springer/fulltext.pdf
Дата изменения: Mon Sep 26 12:35:54 2011
Дата индексирования: Tue Oct 2 05:57:37 2012
Кодировка:
Поисковые слова: integral

A Parallel GNFS Algorithm with the Biorthogonal Blo ck Lanczos Metho d for Integer Factorization
Laurence T. Yang
1

1,2

, Li Xu, Man Lin, and John Quinn

Department of Computer Science and Engineering Jiangsu Polytechnic University Changzhou, Jiangsu Province, 213164, P.R. China 2 Department of Computer Science St. Francis Xavier University Antigonish, Nova Scotia, B2G 2W5, Canada {lyang, x2002uwf, mlin, jquinn}@stfx.ca

Abstract. Currently, RSA is a very p opular, widely used and secure public key cryptosystem, but the security of the RSA cryptosystem is based on the difficulty of factoring large integers. The General Numb er Field Sieve (GNFS) algorithm is the b est known method for factoring large integers over 110 digits. Our previous work on the parallel GNFS algorithm, which integrated the Montgomery's block Lanczos algorithm to solve the large and sparse linear systems over GF(2), has one ma jor disadvantage, namely the input has to b e symmetric (we have to symmetrize the input for nonsymmetric case and this will shrink the rank). In this pap er, we successfully implement the parallel General Numb er Field Sieve (GNFS) algorithm and integrate with a new algorithm called the biorthogonal block Lanczos algorithm for solving large and sparse linear systems over GF(2). This new algorithm is based on the biothorgonal technique, can find more solutions or dep endencies than Montgomery's block Lanczos method with less iterations. The detailed exp erimental results on a SUN cluster will b e presented as well.

1

Introduction

Currently, Rivest-Shamir-Adleman (RSA) algorithm [16] is the most popular algorithm in public-key cryptosystem. It also has been widely used in the realworld applications such as: internet explorer, email systems, online banking, critical electronic transactions and many other places. The security of this algorithm mainly relies on the difficulty of factoring large integers. So far, many integer factorization algorithms have been developed such as: Trial division [18], Pollard's p-1 algorithm [14], Lenstra Elliptic Curve Factorization (ECM) [9], Quadratic Sieve (QS) [15] and General Number Field Sieve (GNFS) [1,2,3,11] algorithm. Although GNFS algorithm is the fastest integer factoring algorithm over 110 digits so far, it still takes a long time to factor large integers. In order to reduce
L.T. Yang et al. (Eds.): ATC 2006, LNCS 4158, pp. 428438, 2006. c Springer-Verlag Berlin Heidelb erg 2006

A Parallel GNFS Algorithm with the Biorthogonal Block Lanczos Method

429

the execution time, one natural solution is to distribute jobs to parallel computers. The GNFS algorithm contains several time consuming steps. The most time consuming one is the sieving part which is used to generate enough relations. This step is very suitable for parallelization because the relation generations are independent. Another possible step is, in GNFS, the Montgomery's blo ck Lanczos algorithm [12]. It is used to solve large and sparse linear systems over GF(2) generated by the GNFS algorithm. This blo ck Lanczos algorithm has two drawbacks: 1. The input of this algorithm is restricted to a symmetric matrix. For the nonsymmetric inputs, we have to make them symmetric first by multiplying the co efficient matrix A with AT . However over GF(2), the rank of the pro duct AT A is, in general, much less than that of A. Thus, when applied to find elements of the nullspace of A, the Montgomery's blo ck Lanczos metho d may find many spurious vectors. 2. It will break down for some case. The biorthogonal blo ck Lanczos [5] has overcome the first drawback and can find more solutions or dependencies than Montgomery's block Lanczos metho d with less iterations. In this paper we have successfully implemented the biorthogonal blo ck Lanczos algorithm and integrated together with the GNFS algorithm. The rest of this paper is organized as follows: First we briefly describe the original GNFS algorithm in section 2. Then we present two blo ck Lanczos algorithms, namely the Montgomery's block Lanczos algorithm [12] and the biorthogonal block Lanczos algorithm [5] in section 3 and 4 respectively. Section 5 shows the detailed implementation and corresponding parallel performance results.

2

The GNFS Algorithm

The General Number Field Sieve (GNFS) algorithm [2,3,11] is derived from the number fields sieve (NFS) algorithm, developed by Lenstra et al [10]. It is the fastest known algorithm for integer factorization. The idea of GNFS is from the congruence of squares algorithm [8]. Suppose we want to factor an integer n where n has two prime factors p and q . Let's assume we have two integers s and r, such that s2 and r2 are perfect squares and satisfy the constraint s2 r2 (mod n). Since n = pq , the following conditions must hold [2]: pq|(s2 -r2 ) pq|(s-r)(s+r) p|(s-r)(s+r) and q|(s-r)(s+r). We know that, if c|ab and gcd(b,c) = 1, then c|a. So p, q, p|(s-r) or p|(s+r) and q|(s-r) or q|(s+r). Based on this, it we can find factors of n by computing the greatest common and gcd(n,(s-r)) with the possibility of 2/3 (see [2]). Therefore, the essence of GNFS algorithm is based on the n by computing the gcd(n, s+r) and gcd(n, s-r). There are r and s must satisfy can be proved that divisor gcd(n,(s+r)) idea of the factoring six ma jor steps [11]:

1. Selecting parameters: Cho ose an integer mZ and a polynomial f which satisfy f(m) 0 (mod n).

430

L.T. Yang et al. Table 1. The comp osite numb er n and the results after integer factorization

Name tst10030 F739 tst15045 Briggs51 tst20061 tst25076

Number 727563736353655223147641208603 = 743774339337499·978204944528897 680564733841876926926749214863536422914 = 5704689200685129054721·59649589127497217 799356282580692644127991443712991753990450969 = 32823111293257851893153·24353458617583497303673 556158012756522140970101270050308458769458529626977 = 1236405128000120870775846228354119184397·449818591141 1241445153765162090376032461564730757085137334450817128010073 = 1127192007137697372923951166979·1101360855918052649813406915187 3675041894739039405533259197211548846143110109152323761665377505538520830273 = 69119855780815625390997974542224894323·53169119831396634916152282437374262651

2. Defining three factor bases: rational factor base R, algebraic factor base A and quadratic character base Q. 3. Sieving: Generate enough pairs (a,b) (relations) to build a linear dependence. 4. Pro cessing relations: Filter out useful pairs (a,b) that were found from sieving. 5. Building up a large and sparse linear system over GF(2) and solve it. 6. Squaring ro ot, use the results from the previous step to generate two perfect squares, then factor n.

3

Montgomery's Block Lanczos Algorithm

In 1995, Montgomery proposed an algorithm for solving linear systems over GF(2) named Montgomery's block Lanczos algorithm [12]. This blo ck Lanczos algorithm is a variant of standard Lancozs metho d [6,7]. Both Lanczos algorithms are used to solve linear systems. In the standard Lanczos algorithm, suppose we have a symmetric matrix ARnвn . Based on the notations used in [12], the algorithm can be described as follows: w0 = b,
i-1

wi = Awi-1 -
j =0

T wj A2 wi-1 . T wj Awj

(1)

The iteration will stop when wi =0. {w0 , w1 , ... wi-1 } are basis of span{b, Ab, A2 b, ...} with the properties: 0 i < m, 0 i < j < m,
T wi Awi = 0,

(2)

T T wi Awj = wj Awi = 0.

(3)

A Parallel GNFS Algorithm with the Biorthogonal Block Lanczos Method

431

The solution x can be computed as follows:
m- 1

x=
j =0

T wj b wj . T wj Awj

(4)

Furthermore the iteration of wi can be simplified as follows: wi = Awi-1 - (Awi-1 )T (Awi-1 ) (Awi-2 )T (Awi-1 ) wi-1 - wi-2 . T (Aw T wi-1 wi-2 (Awi-2 ) i-1 ) (5)

The time complexity of the Standard Lanczos algorithm is O(dn2 )+O(n2 ), d is the average nonzero entries per row or column. The Montgomery's blo ck Lanczos algorithm is an extension of the Standard Lanczos algorithm by applying it over GF(2). There are some go o d properties on GF(2), for example, we can apply matrix to N vectors at a time (N is the length of computer word) and we can also apply bitwise operations. Instead of using vectors for iteration, we using subspace instead. First we generate subspace: Wi is A - inv ertible, (6)
-1

T Wj AWi = {0}, {i = j }, AW W , W = W0 + W1 + ... + Wm

.

Then we define x to be:
m- 1

x=
j =0

Wj (WjT AWj )-1 WjT b,

(7)

where W is a basis of W . The iteration in the standard Lanczos algorithm will be changed to:

Wi = Vi Si ,
i T Vi+1 = AWi Si + Vi - j =0

Wj Ci+1,j

(i 0), (8)

Wi = Wi , in which
T Ci+1,j = (WjT AWj )-1 WjT A(AWi Si + Vi ).

(9)

This iteration will stop when Vi T AVi =0 where i = m. The iteration can also be simplified as follows:
T Vi+1 = AVi Si Si + Vi D i+1

+ Vi-1 Ei+1 + Vi-2 Fi+1 .

432

L.T. Yang et al.
i+1

where D D
i+1

,Ei+1 ,Fi+1 can be computed:

T = IN - Wiinv (ViT A2 Vi Si Si + ViT AVi ),

v T Ei+1 = -Wiin1 ViT AVi Si Si , - v v Fi+1 = -Wiin2 (IN - ViT 1 AVi-1 Wiin1 )(ViT 1 A2 Vi-1 S - - - - T i-1 Si-1 T + ViT 1 AVi-1 )Si Si . -

Si is an N в Ni pro jection matrix (N is the length of computer word and Ni < N ). We can reduce the iteration time from O(n2 ) to O(n2 /N) (n is the matrix's row or column size) using the blo ck Lanczos algorithm.

4

Biorthogonal Block Lanczos Algorithm

The Biorthogonal block Lanczos algorithm is from standard biorthogonal scalar Lancozs algorithm. The idea of standard biorthogonal scalar Lancozs algorithm is proposed by Lanczos [7]. Like the symmetric case we described in section 3, first we cho ose two vector u0 and 0 from Kn form two basis {u0 ,...um-1 } and { 0 ,... m-1 }, and the following conditions must be held [5]: 0 i < m uT Ai = 0, i T 0 i < j < m ui Aj = uT Ai = 0. j Then the solution will be:
m- 1

(10)

x=
i=0

uT b i i . u Ai
T i

(11)

With the condition (10), now we are ready to give the new iterations:
i

u

i+1

:= A ui -
T k=0 i

(AT ui )T Ak uk , uT Ak k (AT uk )T Ai k . uT Ak k (12)

i+1

:= Ai -
k=0

Prop osition 1. If u0 ,....,ui and 0 ,...., i are defined in (12), then uT A2 k = i uT A2 i = 0 for al l 0k i+1

:= AT ui - := Ai -

(AT ui )T Ai (AT ui )T Ai-1 ui - u T A ui uT-1 Ai-1 i i

i-1

, .

(13) (14)

i+1

(AT ui )T Ai (AT ui-1 )T Ai i - T A ui uT-1 Ai-1 i i

i-1

A Parallel GNFS Algorithm with the Biorthogonal Block Lanczos Method

433

Instead of using scalars in real fields, now we extend this standard biorthogonal scalar Lancozs algorithm to our problem over GF(2). The input of this algorithm can be either symmetric or nonsymmetric. Montgomery's blo ck Lanczos algorithm only takes a symmetric matrix as the input. For the nonsymmetric matrix, some preconditioning must be performanced first, such as AT A. Generally speaking, The rank of AT A is much less than the rank of A, Thus, when applied to find elements of the nullspace of A, the Montgomery's blo ck Lanczos method may find many spurious vectors, then lose some solutions accordingly. The pro cedure of biothogonal block Lanczos algorithm are: first we cho ose u0 , v0 KnвN randomly and uniformly. (here u and v are blo ck vector). Then we compute u1 , u1 ,...um-1 and v1 , v1 ,...vm-1 . Two matrices i and i are also computed by the columns of ui and vi . The following conditions must be hold through the whole algorithm: 1. 2. 3. 4. K(AT , u0 ) = m-1 i . i=0 K(A,v0 ) = m-1 i . i=0 for all 0i
Assuming all the conditions are held, we can give the biorthogonal Blo ck Lanczos iterations. Let i and i be two pro jection matrices and define by i =ui si and i =vi ti . The iterations for ui+1 , vi+1 are:
i

u

i+1

:= AT i sT + ui - i
k=0

T T k (k Ak )-1 k AT (AT i sT + ui ), i i

(15)

vi+1 := Ai tT + vi - i
k=0

T T k (k Ak )-1 k A(Ai tT + vi ). i

(16)

We also want to simply the iterations like what we did in standard biorthogonal scalar Lanczos algorithm. In every iteration, we pick out as many columns as possible from the previous iteration results ui , vi then pro ject them into the Krylov basis [5]. We also have: when 0k T k A2 i T = sT sk k A2 k

= s (A = sT (u k = sT u k =su
T k T k

T k

T

i TT k sk )

A

i k T T j (j Aj )-1 j AT (AT k sT + uk ))T A k

k+1

- uk +
j =0

i

T k+1 T k+1

Ai - sT uT A kk Ai .

i

(17)

So T A2 i can be simplify to sT uT+1 A i , and similarly, we can simplify k kk (AT )2 i to tT vT+1 AT i . This tells us that for some j

434

L.T. Yang et al.
k+1

of uk+1 have been chosen and pro jected into ished which means uT+1 A i = 0. k

,... j , this term will be van-

5

Implementation Details

As we mentioned before, the most time consuming part in GNFS is the sieving part. This part has already been parallelized in our previous work [19,20]. Another part in GNFS program can be improved is to solve the large and sparse linear systems over GF(2) by biorthogonal block Lanczos algorithm, instead of using the Montgomery's blo ck Lanczos algorithm which only takes symmetric input. With the new algorithm, we would not lose any solutions. Our parallel code is built on the sequential source GNFS code from Monico [11]. 5.1 Parallel Sieving

The sieving step in sequential GNFS is very suitable for parallel. The purpose of sieving is to find enough (a,b) pairs. The way we do sieving is: fixing b, let a change from -N to N (N is a integer, usually larger than 500 ), then we check each (a,b) pair whether it smo oth over factor bases. The sieving for next b can not start until the previous sieving is finished. After we got enough relations from the sieving step, we start building a linear system over GF(2). This linear system's co efficient could be either symmetric or nonsymmetric, both can be solved by the biorthogonal blo ck Lanczos algorithm. In parallel, we use several pro cessors do sieving simultaneously, each slave no de takes a small range of b, then send results back to master no de. The detailed parallel sieving implementation can be found in [19,20]. 5.2 Hardware and Programming Environment

The whole implementation uses two software packages, the sequential GNFS co de from C. Monico [11] (Written in ANSI C) and sequential biorthogonal blo ck Lanczos co de from Hovinen [5] (Written in C++). For parallel implementation, MPICH1 (Message Passing Interface) [17] library is used. In order to do arbitrary precision arithmetic, the GMP 4.x is also used [4]. We use GUN compiler to compile whole program and MPICH1 [13] for our MPI library. The version of MPICH1 is 1.2.5.2. The cluster we use is a Sun cluster from University of New Brunswick Canada whose system configurations is: Mo del: Sun Microsystems V60. Architecture: x86 cluster. Pro cessor count: 164. Master no de: 3 GB registered DDR-266 ECC SDRAM. Slave no de: 2 to 3 GB registered DDR-266 ECC SDRAM.

In the program, Each slave no de only communicates with the master no de. Fig. 1 shows the flow chart of our parallel program.

A Parallel GNFS Algorithm with the Biorthogonal Block Lanczos Method
Main Program

435

MPI_Init

Slave

Slave

Master

Slave

Slave

MPI_Finalize

Main Program

Fig. 1. Each processors do the sieving at the same time, and all the slave nodes send the result back to master node

6

Performance Evaluation

We have six test cases, each test case have a different size of n, all are listed in Table 1. The sieving time increases when the size of n increases. Table 2 shows the average sieving time for each n with one processor. Table 3 shows the number of pro cessors we use for each test case. Fig. 2 and Fig. 3 show the total execution time for each test case in seconds. The total sieve time for test case: tst100, F7, tst150, Briggs and tst200 are presented in Fig. 4. Fig. 5 gives the total execution time, sieve time, speedup and parallel efficiency with different pro cessor numbers for test case tst250. Fig. 6 gives the speed-up and parallel efficiency for each test case with different pro cessor numbers.
Table 2. Average sieving time for each n name numb tst10030 F739 tst15045 Briggs51 tst20061 tst25076 er of sieve average sieve time(s) 1 35.6 1 28.8 5 50.6 3 85.67 7 560.6 7 4757.91

436

L.T. Yang et al. Table 3. Numb er of processors for each test case name numb er of slave processors tst10030 1,2,4,8,16 F739 1,2,4,8,16 tst15045 1,2,4,8,16 Briggs51 1,2,4,8,16 tst20061 1,2,4,8,16 tst25076 1,2,4,8,16

60
tst100 F7

Total Execution Time(s)

50

40

30

20

0

4

8 12 Number of Processors

16

20

Fig. 2. Execution time for tst100 and F7
3500
tst150 Briggs

24000
tst200

3000 Total Execution Time(s)
Total Execution Time(s)

23000

2500

22000

2000

21000

1500

0

4

8 12 Number of Processors

16

20

20000

0

4

8 12 Number of Processors

16

20

Fig. 3. Execution time for tst150, Briggs and tst200
200
tst100 F7 tst150 Briggs

4000
tst200

150 Total Sieve Time(s)
Total Sieve Time(s)

3000

100

2000

50

1000

0

0

4

8 12 Number of Processors

16

20

0

0

4

8 12 Number of Processors

16

20

Fig. 4. Sieve time for tst100, F7, tst150, Briggs and tst200

A Parallel GNFS Algorithm with the Biorthogonal Block Lanczos Method
5e+05
15 14 13
Speed-up Efficiency

437

4e+05

Execution-time Sieve-time

12 11 10 9 8 7

3e+05 Time(s) 2e+05

6 5 4

1e+05

3 2 1

0

0

4

8 12 Number of Processors

16

20

0

0

5

10 Number of Processors

15

20

Fig. 5. Total execution time, sieve time, sp eedup and efficiency for test case tst250
10
tst100 F7 tst150 Birggs tst200
tst100 F7 tst150 Briggs tst200

8

1.2

Speed-up

Efficiency

6

0.8

4

0.4

2

0

0

4

8 12 Number of Processors

16

20

0

0

4

8 12 Number of Processors

16

20

Fig. 6. Sp eedups and parallel efficiency

Acknowledgements
We would like to thank C. Monico of Texas Tech University and B. Hovinen of University of Waterloo. Our work is based on their sequential source co des. They also helped us with some technical problems through emails. Dr. Silva from IBM advised us many go o d ideas for our parallel program.

References
1. M. E. Briggs. An introduction to the general numb er field sieve. Master's thesis, Virginia Polytechnic Institute and State University, 1998. 2. M. Case. A b eginner's guide to the general numb er field sieve. Oregon State University, ECE575 Data Security and Cryptography Pro ject, 2003. 3. J. Dreib ellbis. Implementing the general numb er field sieve. pages 514, June 2003. 4. T. Granlund. The GNU Multiple Precision Arithmetic Library. TMG Datakonsult, Boston, MA, USA, 2.0.2 edition, June 1996. 5. B. Hovinen. Blocked lanczos-style algorithms over small finite fields. Master Thesis of Mathematics, University of Waterloo, Canada, 2004. 6. C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral op erators. In Journal of Research of the National Bureau of Standards, volume 45, pages 255282, 1950.

438

L.T. Yang et al.

7. C. Lanczos. Solutions of linread equations by minimized iterations. In Journal of Research of the National Bureau of Standards, volume 49, pages 3353, 1952. 8. A. K. Lenstra. Integer factoring. Designs, Codes and Cryptography, 19(2-3):101 128, 2000. 9. H. W. Lenstra. Factoring integers with elliptic curves. Annals of Mathematics(2), 126:649673, 1987. 10. H. W. Lenstra, C. Pomerance, and J. P. Buhler. Factoring integers with the numb er field sieve. In The Development of the Number Field Sieve, volume 1554, pages 50 94, New York, 1993. Lecture Notes in Mathematics, Springer-Verlag. 11. C. Monico. General numb er field sieve documentation. GGNFS Documentation, Nov 2004. 12. P. L. Montgomery. A block lanczos algorithm for finding dep endencies over gf(2). In Proceeding of the EUROCRYPT '95, volume 921 of LNCS, pages 106120. Springer, 1995. 13. MPICH. http://www- unix.mcs.anl.gov/mpi/mpich/ . 14. J. M. Pollard. Theorems on factorization and primality testing. In Proceedings of the Cambridge Philosophical Society, pages 521528, 1974. 15. C. Pomerance. The quadratic sieve factoring algorithm. In Proceeding of the EUROCRYPT 84 Workshop on Advances in Cryptology: Theory and Applications of Cryptographic Techniques, pages 169182. Springer-Verlag, 1985. 16. R. L. Rivest, A. Shamir, and L. M. Adelman. A method for obtaining digital signatures and public-key cryptosystems. Technical Rep ort MIT/LCS/TM-82, 1977. 17. W.Gropp, E.Lusk, and A.Skjellum. Using MPI: portable paral lel programming with the message-passing interface. MIT Press, 1994. 18. M. C. Wunderlich and J. L. Selfridge. A design for a numb er theory package with an optimized trial division routine. Communications of ACM, 17(5):272276, 1974. 19. L. Xu, L. T. Yang, and M. Lin. Parallel general numb er field sieve method for integer factorization. In Proceedings of the 2005 International Conference on Paral lel and Distributed Processing Techniques and Applications (PDPTA-05), pages 10171023, Las Vegas, USA, June 2005. 20. L. T. Yang, L. Xu, and M. Lin. Integer factorization by a parallel gnfs algorithm for public key cryptosystem. In Proceddings of the 2005 International Conference on Embedded Software and Systems (ICESS-05), pages 683695, Xian, China, Decemb er 2005.