\ LANGUAGE : ANS Forth \ COPYRIGHT : Albert van der Horst FIG Chapter Holland \ DATE : 1996-03-14 \ LAST CHANGE : 1996-03-14 \ LAST CHANGE : 1999-01-16 mhx, rewrote threads.frt. It now works over TCP/IP using sockets. \ LAST CHANGE : 1999-01-02 mhx, added RESET-Xputers, changed output format \ LAST CHANGE : 1998-12-31 mhx, for n network-distributed threads \ LAST CHANGE : 1998-12-30 mhx, optimizations (about 5 times faster) and threads \ LAST CHANGE : 1998-12-31 mhx, added channels. Should work over the 'net ... Bugs ..-SEND \ This program may be copied and distributed freely as is. \ It may be modified only for \ - algorithmic improvements \ - porting to other Forth systems. \ Changes and additions by Marcel Hendrix (by permission of AvdH). \ Some of these changes are stylistically frowned upon by Albert van der Horst. \ (Go to http://www.doge.nl/~avdhorst/avdhorst.html for the original code) \ Specifically: do not blame Albert for these constructs: \ NEEDS ?DEF MARKER U>D \ DOUBLE[] CELL[] DLOCAL \ ?ALLOCATE >S S> DSQRT \ The code is of course hopelessly and unavoidably non-standard. NEEDS -threads ?DEF -pipidch [IF] -pipidch [THEN] MARKER -pipidch 6 CONSTANT #threads 9 CONSTANT #n ( amount of decades tried ) 1 [IF] $" frunobulax" CONSTANT $master $" pigsandponies" CONSTANT $slave1 [ELSE] $" frunobulax" CONSTANT $slave1 $" pigsandponies" CONSTANT $master [THEN] $" zappa" CONSTANT $slave2 \ -- UTILITIES ----------------------------------------------------------------------------------- $hostname COUNT $master COUNT COMPARE 0= CONSTANT master? $hostname COUNT $slave1 COUNT COMPARE 0= CONSTANT slave1? $hostname COUNT $slave2 COUNT COMPARE 0= CONSTANT slave2? : \MASTER master? 0= IF POSTPONE \ THEN ; IMMEDIATE : \SLAVE master? IF POSTPONE \ THEN ; IMMEDIATE : DARRAY CREATE 2* CELLS ALLOT \ ( n -- ) DOES> []DOUBLE ; \ ( ix -- addr ) : ARRAY CREATE CELLS ALLOT \ ( n -- ) DOES> []CELL ; \ ( ix -- addr ) \ Print field, in blocks of digits, using delim as a delimiter between blocks : UDDRS ( ud width spacing delim -- ) 0 0 LOCALS| cnt sz delimiter spacing width | (D.) DUP TO cnt 0 ?DO C@+ >S LOOP DROP cnt 0 ?DO I I spacing MOD 0= AND IF delimiter 1 +TO sz ENDIF S> 1 +TO sz LOOP width sz - 0 MAX SPACES sz 0 ?DO EMIT LOOP ; : UD,R ( ud width -- ) 3 "," UDDRS ; \ -- ALGORITHM ----------------------------------------------------------------------------------- \ Prime? tests whether the single precision number P is prime \ Cases 0 1 are in fact illegal but return TRUE : (Prime?) ( p -- flag ) LOCAL P P 4 U< IF TRUE EXIT THEN \ Prevent silly infinite loop P 1 AND 0= IF FALSE EXIT THEN \ Handle even numbers other than 2 P 3 DO P I /MOD I < IF DROP TRUE LEAVE THEN 0= IF FALSE LEAVE THEN 2 +LOOP ; \ Experimental_1: tsize should be >= sqrt(max_number_tested) \ Experimental_2: bytes aren't slower than words 100000 CONSTANT tsize 0 VALUE 'primes VARIABLE ovf : FILL-PRIMES ( -- ) tsize 0 DO I (Prime?) 'primes I + C! LOOP ; : Prime? ( p -- bool ) S" DUP tsize U>= IF (Prime?) 1 ovf +! ELSE 'primes + C@ THEN " EVALUATE ; IMMEDIATE \ about twice faster than DISMISS ; M*/ is really slow. : SDISMISS ( n1 p -- n2 ) LOCAL P LOCAL N N P / LOCAL N' N' P U< IF 1 EXIT THEN \ Only P itself N' P 2 ?DO I Prime? IF N' I RECURSE - \ Dismissed by a smaller prime THEN LOOP ; \ d3 is the amount of numbers <= d1 that are dismissed by the prime n, \ i.e. it is divisible by n but by no smaller prime. \ Requires n<=d1 : DISMISS ( d1 n -- d3 ) ( optimization ) OVER 0= IF NIP SDISMISS U>D EXIT THEN LOCAL P 1 P UM*/ DLOCAL N' N' P U>D DU< IF 1. EXIT \ Only P itself THEN N' P 2 ?DO I Prime? IF N' I RECURSE D- \ Dismissed by a smaller prime THEN LOOP ; 2VARIABLE input #threads DARRAY results #threads ARRAY ichannels #threads ARRAY ochannels #threads 6 <> [IF] CR .( #threads: assert failed setting up channels) ABORT [THEN] CHANNEL c0 c0 0 ichannels ! CHANNEL c1 c1 1 ichannels ! CHANNEL c2 c2 2 ichannels ! CHANNEL c3 c3 3 ichannels ! CHANNEL c4 c4 4 ichannels ! CHANNEL c5 c5 5 ichannels ! CHANNEL x0 x0 0 ochannels ! CHANNEL x1 x1 1 ochannels ! CHANNEL x2 x2 2 ochannels ! CHANNEL x3 x3 3 ochannels ! CHANNEL x4 x4 4 ochannels ! CHANNEL x5 x5 5 ochannels ! \ -- CONFIGURATION ------------------------------------------------------------------------------- \ Configured for a two PC system running 6 threads, connected over 10 Mbit/s ethernet. \ the master, a 166 MHz Pentium, is //frunobulax and runs PI-PI threads 0, 1, 2 \ the slave, a 200 MHz Pentium, is //pigsandponies and runs SLAVE-PI threads 3, 4, 5 \MASTER $slave1 \SLAVE $master DUP c3 ATTACH DUP c4 ATTACH DUP c5 ATTACH DUP x3 ATTACH DUP x4 ATTACH x5 ATTACH \ -- COMPUTER ------------------------------------------------------------------------------------ : PI:N ( id# -- ) DUP >R ichannels @ CHANNEL-3@ ( d1 #thread ) 2* 3 + LOCAL lower DLOCAL N 0. ( accu ) N DSQRT 1+ DUP lower U<= IF DROP R> ochannels @ CHANNEL-2! EXIT THEN lower DO I Prime? IF N I DISMISS D+ \ Count multiples of I -1. D+ \ except I itself THEN #threads 2* +LOOP R> ochannels @ CHANNEL-2! ; \ -- EMITTER ------------------------------------------------------------------------------------- \ Distribute the number to work on to the tasks. A task is supposed never to hang when \ outputting, so EMITTER should finish at once. : EMITTER ( -- ) #threads 0 ?DO input 2@ I I ichannels @ CHANNEL-3! LOOP ; \ -- COLLECTOR ----------------------------------------------------------------------------------- 0 VALUE #msgs \ Get the results of a task. As this task signalled to be ready for output, RESULT! should \ finish at once. : RESULT! ( id# -- ) >R R@ ochannels @ CHANNEL-2@ R> results 2! 1 +TO #msgs ; : COLLECTOR ( -- ) #threads 6 <> ABORT" #threads: assert failed in COLLECTOR" 0 TO #msgs BEGIN SELECT 0 ochannels @ GUARD 0 RESULT! ENDGUARD 1 ochannels @ GUARD 1 RESULT! ENDGUARD 2 ochannels @ GUARD 2 RESULT! ENDGUARD 3 ochannels @ GUARD 3 RESULT! ENDGUARD 4 ochannels @ GUARD 4 RESULT! ENDGUARD 5 ochannels @ GUARD 5 RESULT! ENDGUARD 50000 TIMEOUT CR ." timeout in collector" CR ENDGUARD ENDSELECT #msgs #threads = UNTIL ; \ -- MAIN ---------------------------------------------------------------------------------------- \ This is how the main routine on the master is defined. \ BUG: The textual order in which threads are compiled (in PAR ) might be critical. : findPI ( d -- ) #threads 6 <> ABORT" #threads: assert failed in findPI" input 2! CR ." \ Input " input 2@ 16 UD,R TIMER-RESET PAR STARTP 0 PI:N ENDP STARTP 1 PI:N ENDP STARTP 2 PI:N ENDP STARTP EMITTER ENDP STARTP COLLECTOR ENDP ENDPAR ." -> result " input 2@ 1. D- input 2@ D2/ D- 1. D+ #threads 0 ?DO I results 2@ D- LOOP 16 UD,R ." (" .ELAPSED ." )" ; : *10 ( d -- 10*d ) 2DUP D2* 2SWAP 3 DLSHIFT D+ ; : PI-PI ( -- ) CR ." \ ** PI(N) using a " tsize . ." BYTE array **" CR ." \ ** " #threads 0 .R ." -threaded version **" 0 ovf ! tsize CHARS ALLOCATE ?ALLOCATE TO 'primes FILL-PRIMES 10. \ 100,000,000 #n 0 DO 2DUP findPI *10 LOOP 2DROP CR ." \ " ovf @ . ." cache misses" 'primes FREE ?ALLOCATE ; : SLAVE-PI ( -- ) #threads 6 <> ABORT" #threads: assert failed in SLAVE-PI" CR ." \ Slave starting as threads #3, #4 and #5" 0 ovf ! tsize CHARS ALLOCATE ?ALLOCATE TO 'primes FILL-PRIMES #n 0 DO PAR STARTP 3 PI:N ENDP STARTP 4 PI:N ENDP STARTP 5 PI:N ENDP ENDPAR LOOP CR ." \ " ovf @ . ." cache misses" 'primes FREE ?ALLOCATE ; \MASTER CR .( Try: PI-PI) \SLAVE CR .( Try: SLAVE-PI) ( This is the SLAVE [ //pigsandponies :: 200 MHz Pentium ] -------------------------------------------------------- FORTH> slave-pi \ Slave starting as threads #3, #4 and #5 \ 0 cache misses ok FORTH> .network local channel | connected to | log. id# | lsocket | rsocket | wsocket ------------------+-----------------+----------+---------+---------+--------- x5 | frunobulax | 31470 | 416 | 0 | 424 x4 | frunobulax | 31469 | 404 | 0 | 432 x3 | frunobulax | 31468 | 392 | 0 | 440 x2 | [local] | 31467 | 0 | 0 | 0 x1 | [local] | 31466 | 0 | 0 | 0 x0 | [local] | 31465 | 0 | 0 | 0 c5 | frunobulax | 31464 | 0 | 364 | 0 c4 | frunobulax | 31463 | 0 | 336 | 0 c3 | frunobulax | 31462 | 0 | 368 | 0 c2 | [local] | 31461 | 0 | 0 | 0 c1 | [local] | 31460 | 0 | 0 | 0 c0 | [local] | 31459 | 0 | 0 | 0 This is the MASTER [ //frunobulax :: 166 MHz Pentium ] ------------------------------------------------------ FORTH> pi-pi \ ** PI(N) using a 100000 BYTE array ** \ ** 6-threaded version ** \ Input 10 -> result 4 <1.506 seconds elapsed> \ Input 100 -> result 25 <0.148 seconds elapsed> \ Input 1,000 -> result 168 <0.175 seconds elapsed> \ Input 10,000 -> result 1,229 <0.188 seconds elapsed> \ Input 100,000 -> result 9,592 <0.187 seconds elapsed> \ Input 1,000,000 -> result 78,498 <0.188 seconds elapsed> \ Input 10,000,000 -> result 664,579 <0.252 seconds elapsed> \ Input 100,000,000 -> result 5,761,455 <2.195 seconds elapsed> \ Input 1,000,000,000 -> result 50,847,534 <19.879 seconds elapsed> \ 0 cache misses ok \ Remark: Note the 1.5 second network startup delay for the first number. FORTH> .network local channel | connected to | log. id# | lsocket | rsocket | wsocket ------------------+-----------------+----------+---------+---------+--------- x5 | pigsandponies | 31470 | 0 | 360 | 0 x4 | pigsandponies | 31469 | 0 | 364 | 0 x3 | pigsandponies | 31468 | 0 | 372 | 0 x2 | [local] | 31467 | 320 | 404 | 388 x1 | [local] | 31466 | 192 | 436 | 424 x0 | [local] | 31465 | 312 | 416 | 440 c5 | pigsandponies | 31464 | 344 | 0 | 348 c4 | pigsandponies | 31463 | 332 | 0 | 336 c3 | pigsandponies | 31462 | 156 | 0 | 324 c2 | [local] | 31461 | 236 | 284 | 268 c1 | [local] | 31460 | 272 | 288 | 248 c0 | [local] | 31459 | 240 | 148 | 200 ) \ EOF