(* * LANGUAGE : ANS Forth * PROJECT : Forth Environments * DESCRIPTION : Fast FFT routine. * CATEGORY : Benchmark * AUTHOR : Ron Mayer * LAST CHANGE : January 15, 2001, Marcel Hendrix *) NEEDS -miscutil NEEDS -fsl_util REVISION -fft "ÄÄÄ FFT/FHT benchmark Version 1.00 ÄÄÄ" PRIVATES DOC (* FFT and FHT routines Copyright 1988, 1993; Ron Mayer fht(fz,n); Does a hartley transform of "n" points in the array "fz". fft(n,real,imag) Does a fourier transform of "n" points of the "real" and "imag" arrays. ifft(n,real,imag) Does an inverse fourier transform of "n" points of the "real" and "imag" arrays. realfft(n,real) Does a real-valued fourier transform of "n" points of the "real" and "imag" arrays. The real part of the transform ends up in the first half of the array and the imaginary part of the transform ends up in the second half of the array. realifft(n,real) The inverse of the realfft() routine above. NOTE: This routine uses at least 2 patented algorithms, and may be under the restrictions of a bunch of different organizations. Although I wrote it completely myself; it is kind of a derivative of a routine I once authored and released under the GPL, so it may fall under the free software foundation's restrictions; it was worked on as a Stanford Univ project, so they claim some rights to it; it was further optimized at work here, so I think this company claims parts of it. The patents are held by R. Bracewell (the FHT algorithm) and O. Buneman (the trig generator), both at Stanford Univ. If it were up to me, I'd say go do whatever you want with it; but it would be polite to give credit to the following people if you use this anywhere: Euler - probable inventor of the fourier transform. Gauss - probable inventor of the FFT. Hartley - probable inventor of the hartley transform. Buneman - for a really cool trig generator Mayer(me) - for authoring this particular version and including all the optimizations in one package. Thanks, Ron Mayer; mayer@acuson.com NOTE: The needed vectorlength (for the benchmark) far exceeds the cache size of a simple PC. This is the result on a 900 MHz AMD Athlon. Note the enormous slowdown between size 16K and size 32K. FFT-Mayer: 04 Oct 1994 FFT size: 131072, run-time = 420 ms, sum of squared errors = 1.215567E-9 FFT size: 65536, run-time = 192 ms, sum of squared errors = 1.538225E-10 FFT size: 32768, run-time = 79 ms, sum of squared errors = 6.745882E-13 FFT size: 16384, run-time = 14 ms, sum of squared errors = 1.892659E-14 FFT size: 8192, run-time = 4 ms, sum of squared errors = 7.951561E-16 FFT size: 4096, run-time = 2 ms, sum of squared errors = 3.672111E-17 FFT size: 2048, run-time = 1 ms, sum of squared errors = 1.503414E-17 FFT size: 1024, run-time = 0 ms, sum of squared errors = 1.129821E-19 FFT size: 512, run-time = 0 ms, sum of squared errors = 1.015006E-21 FFT size: 256, run-time = 0 ms, sum of squared errors = 2.096234E-22 FFT size: 128, run-time = 0 ms, sum of squared errors = 3.199519E-23 FFT size: 64, run-time = 0 ms, sum of squared errors = 6.671870E-26 FFT size: 32, run-time = 0 ms, sum of squared errors = 2.208811E-28 FFT size: 16, run-time = 0 ms, sum of squared errors = 1.522502E-28 FFT size: 8, run-time = 0 ms, sum of squared errors = 0.000000 FFT size: 4, run-time = 0 ms, sum of squared errors = 0.000000 This is the FHT from the FSL. Again the slowdown occurs for size > 16K (of course). Note that Ron Mayer's code is about twice faster than Skip Carter's. FFT-Mayer: 04 Oct 1994 FFT size: 131072, run-time = 821 ms, sum of squared errors = 3.974365E-10 FFT size: 65536, run-time = 368 ms, sum of squared errors = 8.026001E-12 FFT size: 32768, run-time = 162 ms, sum of squared errors = 4.827262E-13 FFT size: 16384, run-time = 25 ms, sum of squared errors = 9.611357E-15 FFT size: 8192, run-time = 10 ms, sum of squared errors = 1.571281E-17 FFT size: 4096, run-time = 4 ms, sum of squared errors = 5.316188E-18 FFT size: 2048, run-time = 2 ms, sum of squared errors = 5.066453E-20 FFT size: 1024, run-time = 1 ms, sum of squared errors = 9.696912E-21 FFT size: 512, run-time = 0 ms, sum of squared errors = 1.940720E-22 FFT size: 256, run-time = 0 ms, sum of squared errors = 7.471889E-24 FFT size: 128, run-time = 0 ms, sum of squared errors = 6.410638E-25 FFT size: 64, run-time = 0 ms, sum of squared errors = 1.114503E-26 FFT size: 32, run-time = 0 ms, sum of squared errors = 2.848626E-27 FFT size: 16, run-time = 0 ms, sum of squared errors = 0.000000 FFT size: 8, run-time = 0 ms, sum of squared errors = 2.632823E-29 FFT size: 4, run-time = 0 ms, sum of squared errors = 0.000000 ok *) ENDDOC DOUBLE DARRAY real{ PRIVATE DOUBLE DARRAY imag{ PRIVATE DOUBLE DARRAY rx{ PRIVATE DOUBLE DARRAY costab{ PRIVATE costab{ #16 1 }}FREAD 0.00000000000000000000000000000000000000000000000000e 0.70710678118654752440084436210484903928483593768847e 0.92387953251128675612818318939678828682241662586364e 0.98078528040323044912618223613423903697393373089333e 0.99518472667219688624483695310947992157547486872985e 0.99879545620517239271477160475910069444320361470461e 0.99969881869620422011576564966617219685006108125772e 0.99992470183914454092164649119638322435060646880221e 0.99998117528260114265699043772856771617391725094433e 0.99999529380957617151158012570011989955298763362218e 0.99999882345170190992902571017152601904826792288976e 0.99999970586288221916022821773876567711626389934930e 0.99999992646571785114473148070738785694820115568892e 0.99999998161642929380834691540290971450507605124278e 0.99999999540410731289097193313960614895889430318945e 0.99999999885102682756267330779455410840053741619428e costab{ #16 }REDIM DOUBLE DARRAY sintab{ PRIVATE sintab{ #16 1 }}FREAD 1.00000000000000000000000000000000000000000000000000e 0.70710678118654752440084436210484903928483593768846e 0.38268343236508977172845998403039886676134456248561e 0.19509032201612826784828486847702224092769161775195e 0.09801714032956060199419556388864184586113667316749e 0.04906767432741801425495497694268265831474536302574e 0.02454122852291228803173452945928292506546611923944e 0.01227153828571992607940826195100321214037231959176e 0.00613588464915447535964023459037258091705788631738e 0.00306795676296597627014536549091984251894461021344e 0.00153398018628476561230369715026407907995486457522e 0.00076699031874270452693856835794857664314091945205e 0.00038349518757139558907246168118138126339502603495e 0.00019174759731070330743990956198900093346887403385e 0.00009587379909597734587051721097647635118706561284e 0.00004793689960306688454900399049465887274686668768e sintab{ #16 }REDIM 2e FSQRT FCONSTANT SQRT2 PRIVATE : FVALUES ( n -- ) 0 ?DO 0e FVALUE PRIVATE LOOP ;P 2 FVALUES a b 8 FVALUES c1 s1 s2 c2 s3 c3 s4 c4 8 FVALUES f0 g0 f1 g1 f2 g2 f3 g3 0e FVALUE t_c PRIVATE 0e FVALUE t_s PRIVATE : power-2? ( n -- t/f ) DUP 1- AND 0= ; \ is N a power of 2 ? : FHT ( fz{ n -- ) 0 0 0 0 0 0 0 0 0 LOCALS| fi[ fn[ gi[ kk k1 k2 k3 k4 kx n fz{ | CLEAR k2 n 1 ?DO n 1 RSHIFT TO kk BEGIN k2 kk XOR DUP TO k2 kk AND 0= WHILE kk 1 RSHIFT TO kk REPEAT I k2 > IF fz{ I } DUP DF@ fz{ k2 } DUP DF@ SWAP DF! DF! ENDIF LOOP CLEAR kk BEGIN kk 2^x n < WHILE 1 +TO kk REPEAT kk 1 AND TO kk kk 0= IF fz{ 0 } TO fi[ fz{ n } TO fn[ BEGIN fi[ fn[ U< WHILE fi[ DF@+ DF@+ F2DUP F- TO f1 F+ TO f0 DF@+ DF@ F2DUP F- TO f3 F+ TO f2 f0 f2 F2DUP F- fi[ 2 DFLOAT[] DF! F+ fi[ DF! f1 f3 F2DUP F- fi[ 3 DFLOAT[] DF! F+ fi[ DFLOAT+ DF! 4 DFLOATS +TO fi[ REPEAT ELSE fz{ 0 } TO fi[ fz{ n } TO fn[ fz{ 1 } TO gi[ BEGIN fi[ fn[ U< WHILE fi[ DF@ gi[ DF@ F2DUP F- TO c1 F+ TO s1 fi[ 2 DFLOAT[] DF@ gi[ 2 DFLOAT[] DF@ F2DUP F- TO c2 F+ TO s2 fi[ 4 DFLOAT[] DF@ gi[ 4 DFLOAT[] DF@ F2DUP F- TO c3 F+ TO s3 fi[ 6 DFLOAT[] DF@ gi[ 6 DFLOAT[] DF@ F2DUP F- TO c4 F+ TO s4 s1 s2 F2DUP F- TO f1 F+ TO f0 c1 c2 F2DUP F- TO g1 F+ TO g0 s3 s4 F2DUP F- TO f3 F+ TO f2 SQRT2 c4 F* TO g3 SQRT2 c3 F* TO g2 f0 f2 F2DUP F- fi[ 4 DFLOAT[] DF! F+ fi[ DF! f1 f3 F2DUP F- fi[ 6 DFLOAT[] DF! F+ fi[ 2 DFLOAT[] DF! g0 g2 F2DUP F- gi[ 4 DFLOAT[] DF! F+ gi[ DF! g1 g3 F2DUP F- gi[ 6 DFLOAT[] DF! F+ gi[ 2 DFLOAT[] DF! 8 DFLOATS +TO fi[ 8 DFLOATS +TO gi[ REPEAT ENDIF n 16 U< IF EXIT ENDIF BEGIN 2 +TO kk kk 2^x TO k1 k1 1 LSHIFT TO k2 k2 1 LSHIFT TO k4 k1 k2 + TO k3 k1 1 RSHIFT TO kx fz{ 0 } TO fi[ fi[ kx DFLOAT[] TO gi[ fz{ n } TO fn[ BEGIN fi[ DF@ fi[ k1 DFLOAT[] DF@ F2DUP F- TO f1 F+ TO f0 fi[ k2 DFLOAT[] DF@ fi[ k3 DFLOAT[] DF@ F2DUP F- TO f3 F+ TO f2 f0 f2 F2DUP F- fi[ k2 DFLOAT[] DF! F+ fi[ DF! f1 f3 F2DUP F- fi[ k3 DFLOAT[] DF! F+ fi[ k1 DFLOAT[] DF! gi[ DF@ gi[ k1 DFLOAT[] DF@ F2DUP F- TO g1 F+ TO g0 gi[ k3 DFLOAT[] DF@ SQRT2 F* TO g3 gi[ k2 DFLOAT[] DF@ SQRT2 F* TO g2 g0 g2 F2DUP F- gi[ k2 DFLOAT[] DF! F+ gi[ DF! g1 g3 F2DUP F- gi[ k3 DFLOAT[] DF! F+ gi[ k1 DFLOAT[] DF! k4 DFLOATS +TO fi[ k4 DFLOATS +TO gi[ fi[ fn[ U>= UNTIL 0e TO s1 1e TO c1 costab{ kk } DF@ TO t_c sintab{ kk } DF@ TO t_s kx 1 DO s1 c1 F2DUP t_s F* FSWAP t_c F* F+ TO s1 ( s1 = c1*t_s + s1*t_c ) t_c F* FSWAP t_s F* F- TO c1 ( c1 = c1*t_c - s1*t_s ) c1 FSQR s1 FSQR F- TO c2 c1 s1 F* F2* TO s2 fz{ n } TO fn[ fz{ I } TO fi[ fz{ k1 I - } TO gi[ BEGIN fi[ k1 DFLOAT[] DF@ gi[ k1 DFLOAT[] DF@ F2DUP s2 F* FSWAP c2 F* F+ TO a c2 F* FNEGATE FSWAP s2 F* F+ TO b fi[ DF@ a F2DUP F- TO f1 F+ TO f0 gi[ DF@ b F2DUP F- TO g1 F+ TO g0 fi[ k3 DFLOAT[] DF@ gi[ k3 DFLOAT[] DF@ F2DUP s2 F* FSWAP c2 F* F+ TO a c2 F* FNEGATE FSWAP s2 F* F+ TO b fi[ k2 DFLOAT[] DF@ a F2DUP F- TO f3 F+ TO f2 gi[ k2 DFLOAT[] DF@ b F2DUP F- TO g3 F+ TO g2 c1 f2 F* s1 g3 F* F+ TO a s1 f2 F* c1 g3 F* F- TO b f0 a F2DUP F- fi[ k2 DFLOAT[] DF! F+ fi[ DF! g1 b F2DUP F- gi[ k3 DFLOAT[] DF! F+ gi[ k1 DFLOAT[] DF! s1 g2 F* c1 f3 F* F+ TO a c1 g2 F* s1 f3 F* F- TO b g0 a F2DUP F- gi[ k2 DFLOAT[] DF! F+ gi[ DF! f1 b F2DUP F- fi[ k3 DFLOAT[] DF! F+ fi[ k1 DFLOAT[] DF! k4 DFLOATS +TO gi[ k4 DFLOATS +TO fi[ fi[ fn[ U>= UNTIL LOOP k4 n U>= UNTIL ; : FHT-RNORMALIZE ( n rf{ -- ) DADDR OVER S>F 1/F 1 ROT DSCAL ; : FHT-INORMALIZE ( n if{ -- ) DADDR OVER S>F 1/F FNEGATE 1 ROT DSCAL ; : FHT-RDENORMALIZE ( n rf{ -- ) 2DROP ; : FHT-IDENORMALIZE ( n if{ -- ) DADDR -1e 1 ROT DSCAL ; -- ------------------------------------------------------------------------ : HARTLEY-TEST ( -- ) rx{ 8 }malloc rx{ DADDR LOCAL r{ 8 0 DO I 1+ S>F rx{ I } DF! LOOP CR CR ." Initial array: " rx{ }print rx{ 8 FHT 8 rx{ FHT-RNORMALIZE CR CR ." Transformed and normalized array: " rx{ }print CR CR ." The above SHOULD be: " CR ." 4.5 -1.707 -1.0 -0.707 -0.5 -0.3 0.0 0.707 " rx{ }free ; -- ------------------------------------------------------------------------ : IFFT ( n real{ imag{ -- ) 2 PICK 1- LOCALS| jj imag{ real{ n | 0e 0e 0e 0e FLOCALS| q r s t | real{ n FHT imag{ n FHT n 2/ 1 ?DO real{ I } DF@ real{ jj } DF@ F2DUP F+ TO q F- TO r imag{ I } DF@ imag{ jj } DF@ F2DUP F+ TO s F- TO t q t F2DUP F- F2/ real{ I } DF! F+ F2/ real{ jj } DF! s r F2DUP F+ F2/ imag{ I } DF! F- F2/ imag{ jj } DF! -1 +TO jj LOOP n real{ FHT-RNORMALIZE n imag{ FHT-INORMALIZE ; : REALFFT ( n real{ -- ) OVER 1- LOCALS| jj real{ n | real{ n FHT n 2/ 1 ?DO real{ I } DUP DF@ real{ jj } DUP DF@ F2DUP F- F2/ DF! ( real{ jj } ) F+ F2/ DF! ( real{ I } ) -1 +TO jj LOOP n real{ FHT-RDENORMALIZE ; : FFT ( n real{ imag{ -- ) 2 PICK 1- LOCALS| jj imag{ real{ n | 0e 0e 0e 0e FLOCALS| q r s t | n 2/ 1 ?DO real{ I } DF@ real{ jj } DF@ F2DUP F+ TO q F- TO r imag{ I } DF@ imag{ jj } DF@ F2DUP F+ TO s F- TO t q t F2DUP F+ F2/ real{ I } DF! F- F2/ real{ jj } DF! s r F2DUP F- F2/ imag{ I } DF! F+ F2/ imag{ jj } DF! -1 +TO jj LOOP real{ n FHT n real{ FHT-RDENORMALIZE imag{ n FHT n imag{ FHT-IDENORMALIZE ; : REALIFFT ( n real{ -- ) OVER 1- LOCALS| jj real{ n | n 2/ 1 ?DO real{ I } DUP DF@ ( a) real{ jj } DUP DF@ ( a b) F2DUP F- DF! ( real{ jj } ) F+ DF! ( real{ I } ) -1 +TO jj LOOP real{ n FHT n real{ FHT-RNORMALIZE ; : TEST-FILL ( n1 -- ) DUP 2* 1+ >R real{ R@ }malloc malloc-fail? imag{ R> }malloc malloc-fail? OR ABORT" TEST-FILL :: out of core" 0 DO I S>F real{ I } DF! 0e imag{ I } DF! LOOP ;P : MAIN ( n -- ) LOCAL num num power-2? 0= ABORT" n must be power of 2" num TEST-FILL CR ." FFT-Mayer: 04 Oct 1994" CR ." FFT Size: " num DEC. num real{ imag{ FFT num real{ imag{ IFFT TIMER-RESET num real{ imag{ FFT num real{ imag{ IFFT MS? 2/ ( -- milliseconds ) CR ." Run Time = " DEC. ." ms." 0e num 0 ?DO real{ I } DF@ I S>F F- FSQR F+ LOOP CR ." sum of squared errors " F. ; : CMAIN ( -- ) #17 2^x LOCALS| num | CR ." FFT-Mayer: 04 Oct 1994" BEGIN num TEST-FILL CR ." FFT size: " num 6 .R num real{ imag{ FFT num real{ imag{ IFFT TIMER-RESET num real{ imag{ FFT num real{ imag{ IFFT MS? 2/ ( -- milliseconds ) ." , run-time = " 5 .R ." ms, " 0e num 0 ?DO real{ I } DF@ I S>F F- FSQR F+ LOOP ." sum of squared errors = " F. num 2/ TO num num 4 U< UNTIL ; -- ------------------------------------------------------------------------ : FFT-TEST ( -- ) #16 TEST-FILL CR CR ." Initial ..." CR ." REAL array" real{ }print CR ." IMAG array" imag{ }print #16 real{ imag{ FFT CR CR ." Transformed and normalized ..." CR ." REAL array: " real{ }print CR ." IMAG array: " imag{ }print CR CR ." The above SHOULD be: " CR ." 120 + 0i CR ." -8 - 40.2187i -8 - 19.3137i -8 - 11.9728i -8 - 8i" CR ." -8 - 5.34543i -8 - 3.31371i -8 - 1.5913i" CR ." -8 + 0i" CR ." -8 + 1.5913i -8 + 3.31371i -8 + 5.34543i" CR ." -8 + 8i -8 + 11.9728i -8 + 19.3137i -8 + 40.2187i " ; -- ------------------------------------------------------------------------ :ABOUT CR ." Try: MAIN -- standard benchmark" CR ." CMAIN -- influence of vector length / cache size" CR ." HARTLEY-TEST -- test FHT " CR ." FFT-TEST -- test FFT " CR CR ." Run CMAIN to find out where your computer becomes cache-limited" CR ." On a PC this is expected to happen when size >= 16K (doubles)" ; DEPRIVE .ABOUT -fft CR (* End of Source *)