Embedded Signals

Web Hosting by Netfirms | Free Domain Names by Netfirms

DSP libraries for Cortex M3 and other ARM processors

We have developed fast DSP library for the Cortex M3. For evaluation version and commercial license details please contact us at imellen@embeddedsignals.com

Quick summary:

Four groups of functions:

Windowing function
Fast Fourier Transform
Complex magnitude (absolute value of complex frequency)
Miscellaneous functions: logarithm(x), exp(x), pseudorandom generator

Three library versions

GCC ( Rowley CrossWorks, Raisonance, …)
Keil MDK-ARM
IAR Embedded Workbench

Windowing functions (e.g. Hamming window)

Windowing is very common step before FFT calculation
Perform speed optimized windowing of input signal before FFT
16 to 32 bit version performs proper scaling of 16 bit signal for 32 bit FFT

FFT functions

Complex and real FFT, 16 and 32bit FFT versions
Radix4/2 FFT – sizes 4,8,16,32,64,128,256,512,1024,2048 and 4096
Inverse FFT available
Real FFT enables much more efficient processing of the real signals
16 bit FFT precision comparable with other fixed point implementation – precision determined by necessary scaling by 0.5 in every FFT stage
32 bit FFT increases dynamic range by 90 dB , needs extra 20% to 50% cycles
Coefficients located in Flash. RAM location means faster FFT for higher latencies.

Magnitude functions

Calculate complex frequency magnitude mag=sqrt (re^2 + im^2)
Based on custom 32 bit square root algorithm (7/13 cycles)
Multiple versions of different speed / precision tradeoffs for 64 bit sqrt

Logarithm and exponent functions

Calculate log2(x) and exp2(x) = 2^x
log2 input, exp2 output: 16q16 unsigned 1/65536 to 65535+65535/65536
log2 output, exp2 input : 5q27 signed -15.99999 to 15.99999
speed: 11/10 cycles ; precision 0.4 ppm / 3 ppm for log2 / exp2
single multiply conversion to log10(x), ln(x), 10^x, e^x and generic base log, exp

Parallel MLS pseudorandom generator for ARM cpu

Maximum Length Sequence generated by Linear Feedback Shift Registers
Periode 2^31-1 to 2^64-1 words (1 to 64 bits wide)
1 to 64 bits generated in parallel
Order of magnitude faster than bit based approach, 3-10 cycles per whole word

General information

Libraries are free for personal use - thoroughly tested by the large developer community
Successfully deployed in many commercial products
Libraries passed extensive validation and verification process, compared against baseline floating point implementation in Matlab
Guaranteed to be overflow safe for valid input data
Written in hand optimized assembly, speed gain based on deep knowledge of ARM processor functionality
Always tested on real hardware
Focused on Cortex M3 core, some libraries ported to ARM 9E core on customer requests
Reasonable compromise between execution speed and code size, can be tailored to customer request
Not restricted to single processor manufacturer as is the case with manufacturers libraries
Currently fastest FFT / SQRT implementation on Cortex M3 (as of April 2010)

Examples of FFT library customization to match customer needs

different input / output scaling (e.g. full scale input 32 bit real FFT)
generate second half of the real FFT (omitted in the standard version due to symmetry)
calculate 2 real FFTs simultaneously using 1 complex FFT
calculate only subset of output frequency bins
different precision than 16 or 32 bits, for example 20 bit data / 12 bit coefficients
custom input/output formatting (interleaving, scaling, normal/bit reversed order)
coefficient location (Flash or RAM)
speed optimization for higher Flash latency

Downloads

FFTlibrary2bench.pdf - Benchmark document with function list

FFTCM3.s - 16 bit complex FFT,16 64 256 1024 4096 points, (Crossworks gcc)

FFTr2CM3.s - 16 bit complex FFT 32 128 512 2048 points, (Crossworks gcc)

FFT128real32.zip - 32 bit real FFT 128 points + windowing + magnitude (IAR, Keil, gcc)

FFT4096Complex32b_ARM9E.s - 32 bit complex FFT for ARM 9E; size16,64,256,1024, 4096

log2exp2.zip - 32 bit logarithm and exponent functions with error plots and description (IAR, Keil, gcc)

MLS_Rnd_Arm.pdf- Parallel MLS pseudorandom generator for ARM, description + code (C, asm)

Please contact us if you need to evaluate functions not posted in the download section.