Fourier analysis for the Csound pvoc generator

  pvanal [flags] infilename outfilename

pvanal converts a soundfile into a series of short-time Fourier transform (STFT) frames at regular timepoints (a frequency-domain representation). The output file can be used by pvoc to generate audio fragments based on the original sample, with timescales and pitches arbitrarily and dynamically modified. Analysis is conditioned by the flags below. A space is optional between the flag and its argument.

As of Csound 4.14, the standard pvanal program has been extended to enable a PVOC-EX format file to be created, using the existing interface. This format supports streaming pvoc opcodes by Richard Dobson, introduced in Csound 4.14. See "Orchestra Signal Types" for an explanation of the streaming pvoc fsig.

To create a PVOC-EX file, the file name must be given the required extension, .pvx. For PVOC-EX files, the FFT size need not be a power-of-two as is required for the original form of .pv  files. Any positive value is accepted, although odd numbers will be rounded up internally. However, power-of-two sizes are still preferred under most conditions.

For PVOC-EX files, channel select flags are ignored. All source channels will be analysed and written to the output file, up to a compiler-set limit of eight channels. The analysis window size (iwinsize) is set internally to double the FFT size.

All the original pvoc opcodes can now read a PVOC-EX file, as well as the native non-portable file format. As the PVOC-EX file uses a double-size analysis window, users may find that this gives a useful improvement in quality, for some sounds and processes, despite the fact that the resynthesis does not use the same window size.

Apart from the window size parameter, the main difference between the original .pv format and the new .pvx is in the amplitude range of analysis frames. Rescaling is applied, so that no significant difference in output level is experienced, whichever file format is used. However, a slight loss of amplitude may still occur, since the use of the double window size modifies frame amplitudes. The resynthesis code does not compenate for this size difference. Note that all the original pvoc opcodes expect a mono analysis file, so multichannel PVOC-EX files will be rejected.

-s srate – sampling rate of the audio input file. This will over-ride the srate of the soundfile header, which otherwise applies. If neither is present, the default is 10000.

-c channel – channel number sought. The default is 1.

-b begin – beginning time (in seconds) of the audio segment to be analyzed. The default is 0.0

-d duration – duration (in seconds) of the audio segment to be analyzed. The default of 0.0 means to the end of the file.

-n frmsiz – STFT frame size, the number of samples in each Fourier analysis frame. Must be a power of two, in the range 16 to 16384. For clean results, a frame must be larger than the longest pitch period of the sample. However, very long frames result in temporal "smearing" or reverberation. The bandwidth of each STFT bin is determined by sampling rate / frame size. The default framesize is the smallest power of two that corresponds to more than 20 milliseconds of the source (e.g. 256 points at 10 kHz sampling, giving a 25.6 ms frame).

-w windfact – Window overlap factor. This controls the number of Fourier transform frames per second. Csound's pvoc will interpolate between frames, but too few frames will generate audible distortion; too many frames will result in a huge analysis file. A good compromise for windfact is 4, meaning that each input point occurs in 4 output windows, or conversely that the offset between successive STFT frames is framesize/4. The default value is 4. Do not use this flag with -h.

-h hopsize – STFT frame offset. Converse of above, specifying the increment in samples between successive frames of analysis (see also lpanal). Do not use with -w.


  pvanal asound pvfile

will analyze the soundfile "asound" using the default frmsiz and windfact to produce the file "pvfile" suitable for use with pvoc.


The output file has a special pvoc header containing details of the source audio file, the analysis frame rate and overlap. Frames of analysis data are stored as float, with the magnitude and 'frequency' (in Hz) for the first N/2 + 1 Fourier bins of each frame in turn. 'Frequency' encodes the phase increment in such a way that for strong harmonics it gives a good indication of the true frequency. For low amplitude or rapidly moving harmonics it is less meaningful.


Dan Ellis
MIT Media Lab
Cambrige, Massachussetts

Richard Dobson (PVOC-EX extensions)
Somerset, England
August, 2001
New in Csound 4.14

