Signal Generators: STFT Resynthesis (Vocoding)


  fr      pvsanal    asig, ifftsize, ioverlap, iwinsize, iwintype[,iformat, init]


Generate an fsig from a mono sound source.


ifftsize – the FFT size in samples. Need not be a power-of-two, although these are especially efficient. ifftsize must be even, however. Odd numbers are rounded up internally. ifftsize determines the number of analysis bins in the output fsig, according to the formula ifftsize/2 + 1.

ioverlap – the distance in samples ("hop size") between overlapping analysis frames. This should be at least ifftsize/4. ioverlap determines the underlying analysis rate according to the formula sr/ioverlap. ioverlap is not required to be a simple factor of ifftsize.

iwinsize – the size in samples of the analysis window filter, as set by iwintype. This must be at least ifftsize, and can be larger. Though other proportions are permitted, it is recommended that iwinsize always be an integral multiple of ifftsize.

iwintype – the shape of the analysis window. Possible values (also supported by PVOC-EX) are:

iformat – the analysis format. Of three possible types, currently only type 0 is implemented.

init – skip reinitialzation. Not currently implemented for any of the streaming pvoc opcodes.


The primary reason to use a non-power-of-two value for ifftsize would be to match the known fundamental frequency of a strongly pitched source. Values with many small factors can be almost as efficient as power-of-two sizes. For example, 384 for a source pitched at around low A = 110 Hz.

The choice of ioverlap may be dictated by the degree of pitch modification applied to the fsig, if any. The more extreme the pitch shift, the higher the analysis rate should be, and hence the smaller the value for ioverlap. A higher analysis rate can also be advantageous with broadband transient sounds, such as drums, where a small analysis window would give less temporal smearing, but more frequency-related errors.

Internally, the analysis window (Hamming, von Hann) is multiplied by a sinc function, so that amplitudes are zero at the boundaries between frames. The larger analysis window size has been found to be especially important for oscillator bank resynthesis (using pvsadsyn), as it has the effect of increasing the frequency resolution of the analysis, and hence the accuracy of the resynthesis. As noted above, iwinsize determines the overall latency of the analysis/resynthesis system. In many cases, and especially in the absence of pitch modifications, it will be found that setting iwinsize = ifftsize works very well, and offers the lowest latency.

The window type is stored as an internal attribute of the fsig, together with the other parameters (see pvsinfo). Both window types (Hamming, von Hanning) are supported by the PVOC-EX file format. Other types may be implemented at a later date, for example the Kaiser window, which is supported by PVOC-EX.

When iformat = 0 (currently the only option with pvsanal) the classic phase vocoder format is implemented. This format is easy to process and a natural format for oscillator-bank resynthesis. Although it is possible to use an fsig frame not as a phase vocoder frame but as a generic additive synthesis frame, the two are not directly equivalent.

iformat is provided in case it proves useful later to add support for these other formats. Formats 0 and 1 are very closely related, since the phase is wrapped in both cases. It is a trivial matter to convert from one to the other.


  ain     in                                  ; live source
  fin     pvsanal    ain, 1024, 256, 2048, 0  ; analyse, using Hamming
  fout    pvsmaska   fin, 1, 0.75             ; apply eq from ftable
  aout    pvsynth    fout                     ; and resynthesize


Richard Dobson
Somerset, England
August, 2001
New in Csound 4.14

