Free Web Hosting by Netfirms
Web Hosting by Netfirms | Free Domain Names by Netfirms


DSP libraries for Cortex M3 and other ARM processors

We have developed fast DSP library for the Cortex M3.  For evaluation version and commercial license details please contact us at

Quick summary:

Four groups of functions:

  • Windowing function

  • Fast Fourier Transform

  • Complex magnitude (absolute value of complex frequency)

  • Miscellaneous functions: logarithm(x), exp(x), pseudorandom generator

Three library versions

  • GCC ( Rowley CrossWorks, Raisonance, …)

  • Keil MDK-ARM

  • IAR Embedded Workbench

Windowing functions (e.g. Hamming window)

  • Windowing is very common step before FFT calculation

  • Perform speed optimized windowing of input signal before FFT

  • 16 to 32 bit version performs proper scaling of 16 bit signal for 32 bit FFT

FFT functions

  • Complex and real FFT, 16 and 32bit FFT versions

  • Radix4/2 FFT – sizes 4,8,16,32,64,128,256,512,1024,2048 and 4096

  • Inverse FFT available

  • Real FFT enables much more efficient processing of the real signals

  • 16 bit FFT precision comparable with other fixed point implementation – precision determined by necessary scaling by 0.5 in every FFT stage

  • 32 bit FFT increases dynamic range by 90 dB , needs extra 20% to 50% cycles

  • Coefficients located in Flash. RAM location means faster FFT for higher latencies.

Magnitude functions

  • Calculate complex frequency magnitude mag=sqrt (re^2 + im^2)

  • Based on custom 32 bit square root algorithm (7/13 cycles)

  • Multiple versions of different speed / precision  tradeoffs for 64 bit sqrt


Logarithm and exponent functions

  • Calculate log2(x) and exp2(x) = 2^x

  • log2 input, exp2 output:   16q16 unsigned   1/65536 to 65535+65535/65536 

  • log2 output, exp2 input :    5q27 signed       -15.99999 to 15.99999

  • speed: 11/10 cycles ;  precision 0.4 ppm / 3 ppm   for log2 / exp2

  • single multiply conversion to  log10(x), ln(x), 10^x, e^x and generic base log, exp


Parallel MLS pseudorandom generator for ARM cpu

  • Maximum Length Sequence generated by Linear Feedback Shift Registers

  • Periode 2^31-1  to  2^64-1 words (1 to 64 bits wide)

  • 1 to 64 bits generated in parallel

  • Order of magnitude faster than bit based approach, 3-10 cycles per whole word


General information

  • Libraries are  free for personal use - thoroughly tested by the large developer community

  • Successfully deployed in many commercial products

  • Libraries passed extensive validation and verification process, compared against baseline floating point implementation in Matlab

  • Guaranteed to be overflow safe for valid input data

  • Written in hand optimized assembly, speed gain based on deep knowledge of ARM processor functionality

  • Always tested on real hardware

  • Focused on Cortex M3 core, some libraries ported to ARM 9E core on customer requests

  • Reasonable compromise between execution speed and code size, can be tailored to customer request

  • Not restricted to single processor manufacturer as is the case with manufacturers libraries

  • Currently fastest FFT / SQRT implementation on Cortex M3 (as of April 2010)


Examples of FFT library customization to match customer needs

  • different input / output scaling (e.g. full scale input 32 bit real FFT)

  • generate second half of the real FFT (omitted in the standard version due to symmetry)

  • calculate 2 real FFTs simultaneously using 1 complex FFT

  • calculate only subset of output frequency bins

  • different precision than 16 or 32 bits, for example 20 bit data / 12 bit coefficients

  • custom input/output formatting (interleaving, scaling, normal/bit reversed order)

  • coefficient location (Flash or RAM)

  • speed optimization for higher Flash latency


FFTlibrary2bench.pdf  - Benchmark document with function list

FFTCM3.s 16 bit complex FFT,16 64 256 1024 4096 points, (Crossworks gcc)

FFTr2CM3.s - 16 bit complex FFT 32 128 512 2048 points, (Crossworks gcc)  - 32 bit real FFT 128 points + windowing + magnitude (IAR, Keil, gcc)

FFT4096Complex32b_ARM9E.s  - 32 bit complex FFT for ARM 9E; size16,64,256,1024, 4096 - 32 bit logarithm and exponent functions with error plots and description (IAR, Keil, gcc)

MLS_Rnd_Arm.pdf- Parallel MLS pseudorandom generator for ARM, description + code (C, asm)


Please contact us if you need to evaluate functions not posted in the download section.