Desirable goals

Dan writes:

I would request that a function returning data block-by-block should be a generator which would make it very flexible, and could also be used to avoid having to have all the data in memory at once.

...

from my point of view, and not having thought through all the flexible types of output that vamp provides, I'd be hoping to type things like:

# all in-memory:

plug = vh.loadPlugin('vamp-example-plugins:mfcc', 44100)
mfccs = np.array([features[0] for features in plug.processall([-0.1, 0, 0.1, 0, -0.1, 0, 0.1, 0])])  # this gives a 2D numpy array of shape [nframes, nmfccs]

plug = vh.loadPlugin('vamp-example-plugins:onsetdetector', 44100)
onsets = np.array([features[0] for features in plug.processall([-0.1, 0, 0.1, 0, -0.1, 0, 0.1, 0])])  # this gives a 1D numpy array, a list of onset times I guess

# from disk, to memory:

plug = vh.loadPlugin('vamp-example-plugins:mfcc', 44100)
with open(filepath) as f:
    mfccs = np.array([features[0] for features in plug.processall(f)])
plt.matshow(mfccs, interpolate='nearest') # a simple pyplot

# fully streaming:

plug = vh.loadPlugin('vamp-example-plugins:mfcc', 44100)
with open(filepath) as f:
    with open(outpath, 'wb') as outf:
        for features in plug.processall(f):
            outpath.write("%g\n" % features[0][0] ** 2)

BTW I noticed I made a couple of mistakes in there

...

Chris writes:

API-wise, that's slightly problematic because the standard plugin interface goes

  load -> initialise -> process, process, process

and one of the biggest problems is people forgetting / misunderstanding the initialise call. (Ignoring for now the merits or otherwise of the API design -- I just mean that this is the way it works now)

So introducing another workflow that looks like

  load -> processall

might be hazardous from that perspective.

Dan:

Note that the generator style means that it's still possible to do "process, process, process" if needed, for example:

 mygenerator = plug.processall(f)
 print mygenerator.next()   # frame 0?
 print mygenerator.next()   # frame 1?
 print mygenerator.next()   # frame 2?

and the initialisation stuff is still bound up in there neatly rather than being forgettable.

Chris:

Also of course if you're running it on an audio file, it knows the sample rate / channel count etc so you should be able to short-circuit it:

   import vampyhost as vh
   results = vh.processall(file, 'vamp-example-plugins:zerocrossings')

and it loads (& initialises) the plugin for you.

Dan:

Yes, feels good as high-level api

Chris:

Another advantage of making processall (or whatever it's called) callable through the module rather than the plugin object is that it ensures you don't go and use the plugin for something else afterwards (with unpredictable results).

Trouble is if you're running it on array data, you have to supply the rate somewhere so you must either load() explicitly or provide the rate to processall.

(I am very keen to make simple use cases like "give me the beat times in this audio file" as simple as possible. Of course plenty of other APIs can now be used to do that, but that doesn't mean we shouldn't do the best job possible with this one)