
DSP libraries for Cortex M3 and
other ARM processors
We have developed fast DSP library for the Cortex M3.
For evaluation version and commercial license details please contact us at
imellen@embeddedsignals.com
Quick summary:
Four groups of functions:

Windowing function

Fast Fourier Transform

Complex magnitude (absolute value of complex
frequency)

Miscellaneous functions: logarithm(x), exp(x), pseudorandom
generator
Three library versions
Windowing functions (e.g. Hamming window)

Windowing is very common step before FFT
calculation

Perform speed optimized windowing of input signal
before FFT

16 to 32 bit version performs proper scaling of 16
bit signal for 32 bit FFT
FFT functions

Complex and real FFT, 16 and 32bit FFT versions

Radix4/2 FFT – sizes
4,8,16,32,64,128,256,512,1024,2048 and 4096

Inverse FFT available

Real FFT enables much more efficient processing of
the real signals

16 bit FFT precision comparable with other fixed
point implementation – precision determined by necessary scaling
by 0.5 in every FFT stage

32 bit FFT increases dynamic range by 90 dB , needs
extra 20% to 50% cycles

Coefficients located in Flash. RAM location means
faster FFT for higher latencies.
Magnitude functions

Calculate complex frequency magnitude mag=sqrt (re^2
+ im^2)

Based on custom 32 bit square root algorithm (7/13
cycles)

Multiple versions of different speed / precision
tradeoffs for 64 bit sqrt
Logarithm
and exponent functions

Calculate
log2(x) and exp2(x) = 2^x

log2
input, exp2 output: 16q16 unsigned 1/65536 to
65535+65535/65536

log2
output, exp2 input : 5q27 signed 15.99999 to 15.99999

speed:
11/10 cycles ; precision 0.4 ppm / 3 ppm for log2 / exp2

single
multiply conversion to log10(x), ln(x), 10^x, e^x and generic
base log, exp
Parallel
MLS pseudorandom generator for ARM cpu

Maximum
Length Sequence generated by Linear Feedback Shift Registers

Periode
2^311 to 2^641 words (1 to 64 bits wide)

1 to 64
bits generated in parallel

Order of
magnitude faster than bit based approach, 310 cycles per whole
word
General information

Libraries are free for personal use  thoroughly
tested by the large developer community

Successfully deployed in many commercial products

Libraries passed extensive validation and
verification process, compared against baseline floating point
implementation in Matlab

Guaranteed to be overflow safe for valid input data

Written in hand optimized assembly, speed gain
based on deep knowledge of ARM processor functionality

Always tested on real hardware

Focused on Cortex M3 core, some libraries ported to
ARM 9E core on customer requests

Reasonable compromise between execution speed and
code size, can be tailored to customer request

Not restricted to single processor manufacturer as
is the case with manufacturers libraries

Currently fastest FFT / SQRT implementation on Cortex
M3 (as of April 2010)
Examples of FFT library customization to match
customer needs

different input / output scaling (e.g. full scale
input 32 bit real FFT)

generate second half of the real FFT (omitted in the
standard version due to symmetry)

calculate 2 real FFTs simultaneously using 1 complex
FFT

calculate only subset of output frequency bins

different precision than 16 or 32 bits, for example
20 bit data / 12 bit coefficients

custom input/output formatting (interleaving,
scaling, normal/bit reversed order)

coefficient location (Flash or RAM)

speed optimization for higher Flash latency
Downloads
FFTlibrary2bench.pdf  Benchmark document with
function list
FFTCM3.s 
16 bit complex FFT,16 64 256 1024 4096 points, (Crossworks
gcc)
FFTr2CM3.s 
16 bit complex FFT 32 128 512 2048 points, (Crossworks gcc)
FFT128real32.zip
 32 bit real FFT 128 points + windowing + magnitude (IAR,
Keil, gcc)
FFT4096Complex32b_ARM9E.s
 32 bit complex FFT for ARM 9E; size16,64,256,1024, 4096
log2exp2.zip 
32 bit logarithm and exponent functions with error plots and
description (IAR, Keil, gcc)
MLS_Rnd_Arm.pdf Parallel MLS pseudorandom generator
for ARM, description + code (C, asm)
Please contact us if you need to evaluate functions
not posted in the download section.
