Numerically controlled oscillators

In my latest project I’ve been using numerically controlled oscillators to generate the waveforms. In this post I want to explain the motivation and theory behind them, with a few comments specifically about audio synthesis.

Motivation

We want to output some periodic signal, e.g. a sine with the frequency frequency, based on the current time. Let’s say we have some counter that increments periodically, cycle, and its frequency f_clk, as an integer (for now). cycle could on the Arduino platform just be a call to micros(), the number of microseconds since the program started running. The f_clk in that case would be 1000000 - a million clock cycles (or microseconds) per second.

In this example we’ll just be generating a sine, but you can use whatever you want, as long as it’s periodic.

uint32_t cycle;
int8_t next_signal() {
  return 127 * sin(2 * pi * frequency * cycle / f_clk);
}

Note that there’s two different frequencies involved here - frequency, which is the frequency of the signal we want to generate, and f_clk, the frequency of our counter.

Problem: sin() is slow if your processor doesn’t have a floating point unit. If you try running this on an 8-bit AVR microcontroller clocked at 16 MHz, calculating this function even once is going to take a few milliseconds. That’s not very useful for most frequencies - if you can generate 200 amplitudes per second, that limits you to frequencies below 100 Hz, as per the sampling theorem.

Getting to an NCO, slowly

Add a lookup table

So, we want to make this faster. Our first idea is to build a lookup table (LUT) and get an index into that. It should cover one period of our function. Calculating this will be slow, but it only has to run once.

uint32_t cycle;
int8_t lut[256];
void build_lut() {
  for(int i = 0; i < 256; i++) {
    lut[i] = 127 * sin(2 * pi * i / 256.);
  }
}

As you can see, the previous frequency * cycle / f_clk in the call to sin() was replaced by i / 256., so our index i can be calculated as 256 * frequency * cycle / f_clk:

int8_t next_signal() {
  return lut[(uint8_t)(256 * frequency * cycle / f_clk)];
}

frequency * cycle / f_clk is the current phase. Multiplying that by 256 gets you the appropriate entry in our LUT in the low 8 bits of the integer component, which you can get at by casting the expression to an unsigned 8-bit integer.

Much better already - the slow sine was replaced by some multiplications and a division. But let’s look at that a little more closely:

Exploit the periodic calls to next_signal()

What if next_signal() gets called periodically as well, e.g. in an interrupt tied to a timer? The indexes of the LUT will change based on the number of cycles between each interrupt:

index_{n} = index_{n-1} + frequency * interrupt_period / f_clk

With this, we can change the index calculation to a simple floating point addition and a conversion to int.

int8_t lut[256];
float phase;
float increment;
void set_frequency(float f) {
  increment = 256 * frequency * interrupt_period / f_clk;
}
int8_t next_signal() {
  phase += increment;
  while(phase >= 256) {
    phase -= 256;
  }
  return lut[(int) phase];
}

There’s no more overflow because we decrease the phase after it’s surpassed the maximum index of 255, and it’s a bit faster than the previous example because now there’s only an addition (or two) instead of a few multiplications.

Remove all floating-point arithmetics

But with fixed point arithmetics it gets even faster: Take an N-bit phase accumulator and use its highest M bits (here: 8) as the index into the LUT. Each time our interrupt runs, this accumulator increases by some fixed value that depends on the frequency of our output waveform and the frequency of the interrupt:

uint32_t increment;
void set_frequency(float f) {
  float f_int = f_cpu / interrupt_period;
  increment = pow(2, 32) / f_int * frequency;
}

If we want a 1 Hz sine, the LUT needs to jump back to its initial position after one second, meaning that the phase accumulator overflows and returns to zero after one second. To do this, it needs to increase by 2^N divided by the frequency of the interrupt f_int each time the interrupt gets called - after half a second it will have run f_int/2 times, and the accumulator will be at (2^32 / f_int) * (f_int/2) = 2^31, exactly in the middle of its period, and after one second it will of course be at (2^32 / f_int) * f_int = 2^32 and overflow.

To get different frequencies, multiply the previous increment with the new frequency - higher multipliers lead to a higher increment, meaning a faster overflow and a faster frequency.

The final step is using the LUT again:

uint32_t phase_accumulator;
int8_t next_signal() {
  phase_accumulator += increment;
  return lut[phase_accumulator >> 24];
}

Now the phase calculation is just an integer addition, followed by a load from memory to get the amplitude of the current waveform. The frequencies of our waveform and interrupt can be floating point values as they’re only used when setting a new frequency. If you’re dealing with tone generation and only need halftones, you can even generate the phase increments of the highest octave you’re going to see and then right-shift those to get the lower octaves, since each octave doubles the frequencies and phase increments of the previous octave.

Modulating the signal

While usually not very useful on slower microcontrollers because of the high frequencies involved (e.g. upwards of 87 MHz for FM radio), it’s quite easy to modulate the new waveforms - amplitude, phase, and frequency modulation are all rather easy to implement.

Amplitude modulation

With this, you can change the volume of generated tones, so it’s still useful on feeble microcontrollers that’re getting forced to play music :)

To do amplitude modulation, just multiply the result from the lookup with the current amplitude of the unmodulated signal:

uint8_t current_amplitude;
int8_t next_signal() {
  phase_accumulator += increment;
  return (current_amplitude * lut[phase_accumulator >> 24]) >> 8;
}

Make sure that the unmodulated signal (or your LUT) is an unsigned integer, or right-shift by 7 instead of 8.

Frequency modulation

This one’s also easy: Just pick a maximum deviation from your default frequency f_d, multiply the signal with it, and add that frequency to your increment (with the same arithmetics behind it).

uint8_t amplitude;
uint32_t increment_f_d;
void set_frequency_deviation(float fd) {
  float f_int = f_cpu / interrupt_period;
  increment_f_d = pow(2, 32) / f_int * fd;
}
int8_t next_signal() {
  phase_accumulator += increment + (increment_f_d * amplitude) >> 8;
  return lut[phase_accumulator >> 24];
}

Adding to the phase increment increases the frequency by some value between 0 Hz and the frequency deviation, the rest is just like in the unmodulated example. To change it so our carrier frequency is in the middle of the band, make current_amplitude a signed value and only shift the multplied increment by 7 bits instead of 8.

Phase modulation

For this one you’ll also need another addition, but this time one step later: Phase modulation works with the phase accumulator instead of the phase increment. Analogous to frequency modulation, you can set the maximum phase deviation. In this case it has to be less than 2 pi / 256 - I’ll leave figuring out a way around that as an exercise to the reader. I’m going to use a phase in the range [0:1] to get rid of the factor 2 pi.

uint8_t amplitude;
uint32_t phase_dev;
void set_phase_deviation(float pd) {
  phase_dev = pow(2, 32) * pd;
}
int8_t next_signal() {
  phase_accumulator += increment;
  uint32_t modulated = phase_accum + (phase_dev * amplitude) >> 8;
  return lut[modulated >> 24];
}

Remarks

The maximum frequency you can generate with this NCO is half of the interrupt frequency, and the resolution is f_int / 2^N - for my starting example of an Arduino running at 16 MHz with an interrupt every 256 cycles and a 16-bit accumulator, that’s a maximum frequency of 31.25 kHz, which is decent enough for audio. The frequency resolution is about half a Hertz. Human hearing has a resolution of about 3.6 Hz, so we’re in the clear with that as well, at least down to about 125 Hz. Even higher resolutions are of course easily possible with 24 or even 32-bit accumulators.

On the AVR instruction set, the most basic NCO processing can be implemented in about 20 cycles, with another 6 cycles if you want to do amplitude modulation. This is enough to process all six PWM channels of an ATMega328 separately, and still have time for other things - more pleasant volume changes, MIDI processing, etc.

Of shifts and unions

One thing to note when using GCC on an AVR microcontroller is that it sometimes actually does 24 right-shifts when you do n >> 24, so it’s better to do it with unions if speed is important, just to make sure:

union {
  uint32_t i;
  uint8_t c[4];
} phase;
int8_t next_signal() {
  phase.i += increment;
  return lut[phase.c[3]];
}

A basic AVR assembly implementation

Here’s an asm version of next_signal(), for a 16-bit phase accumulator. It’ll need 23 cycles (19 without the ret, or a bit more if you want to do things when the phase returns to 0.

next_signal:
  ; Every register used in this function is
  ; call-clobbered, so it doesn't have to save
  ; anything.
  
  ; Load the low byte of phase and increment
    lds r30, phase + 0
    lds r31, increment + 0
  ; Add them
    add r30, 31
  ; Save the low byte of the phase
    sts phase + 0, r30
  ; Same for the high byte, but add with carry
  ; If you want more than 16 bits for your
  ; accumulator, repeat this part with changed
  ; constants (+2, +3...)
    lds r30, phase + 1
    lds r31, increment + 1
    adc r30, r31
    sts phase + 1, r30
  ; Skip the following part if no overflow
  ; happened. You can just remove this part
  ; if you don't want to do anything here.
    brcc load_amplitude
  ; Here you can do stuff that should only
  ; happen when the phase is (almost) zero
load_amplitude:
  ; Set the high byte of Z to 0, then add
  ; the address of the LUT to Z
    ldi r31, 0
    subi r30, lo8(-(lut))
    sbic r31, hi8(-(lut))
  ; Load the amplitude into the return
  ; register, r24, then return.
    ld r24, Z
    ret

For a more complete example that handles more than one oscillator, you can take a look at the corresponding function in my MIDI synth

If you have any comments or questions, feel free to contact me.