Subject: Re: [linux-audio-dev] back to the API
From: David Olofson (audiality_AT_swipnet.se)
Date: ma loka 18 1999 - 18:42:59 EDT
On Mon, 18 Oct 1999, Paul Barton-Davis wrote:
> >> 1) data size: audio (N samples per call to process (); likely
> >> to be N >= 32 ... 128 bytes for RT; non-RT
> >
> >How does a sampleplayer interface with a streaming/caching "daemon"?
> >(A special case that shouldn't be supported?)
>
> can you elaborate on the problem ?
Variable rate streaming. The streaming/caching "daemon" will only
provide guaranteed bandwidth streaming, caching, prebufferng of
real time starting points and that kind of things, while resampling
should be done in other plug-ins. Considering the cost of high
quality resampling, and the fact that some people might want to use
pitch changing (with preserved duration) instead, suggests that
hardcoding it into a big "universal" sampleplayer is a bad idea.
Streaming/caching/prebuffering is a rather complex subsystem in
itself, and it's also very problematic to run more than one per hard
drive, so it should be made as a reusable module. Probably a client,
so that you can use the one that works best for you with your
favourite applications. A standard API for plug-ins should be
defined, so that you can hook up a stream from the streaming/caching
daemon to any plug-in that supports variable rate inputs.
Variable rate outputs? What about data compressors, or simply a max
input frequency -> sample rate "shaper" plug-in? This *will* be
required by some users (hey, some even suggest using mp3 for HDR
*NOW*!), so why not plan it in, rather than breaking the API later on?
> it did occur to me, BTW, that in
> your scenario of the engine outsourcing to a client the job of sending
> output to an actual audio output source, there is *no* way to reliably
> determine the fixed latency that is needed to compute sample-accurate
> timestamps. you have no idea what the client is doing with the data.
A ploblem that is simply ignored by VST and (AFAIK) DirectX. While I
thought about it years ago, of course. ;-)
It seeed obvious to me that ignoring the fact that just about *any*
DSP algorithm will have unavoidable delays, couldn't be a good idea.
True, it doesn't matter much to most users, and it's not compensated
for in analog systems (analog filters *do* have latency!), but I've
always been too much of a perfectionist when it comes to this kind of
things...
> i
> suppose that our API could require the client to give us a number and
> post events if it ever changes.
I already described the foundation for that in one of my first posts:
------8<-----------------------------------------------------------
float inherent_delay; /* Inherent means inherent to the
* processing algorithm, and is
* used for processing latency
* compensation.
* Value is in frames, and should
* reflect sub-sample resolution.
*/
int look_ahead; /* Number of extra frames needed
* after the end of input buffers.
* Can be negative.
*/
int skip_behind; /* Number of frames skipped
* at the start of input buffers.
* (That is, inputs[n]+=look_behind
* before process_XXX() is called.)
* Can be negative.
*/
----------------------------------------------------------->8------
The same functionality should be in the event based API, and should
also be used for clients.
> >> processing might go way higher ... 4KB ?)
> >
> >(IIRC, VST had a 32 kB default buffer size before they realized you
> >don't want to wait for ages to hear the effect of an automation
> >edit...)
>
> thats reasonable, but if you want to use a really great reverb (say,
> convolution based) fed through a fancy plugin and you know it will
> take hours to generate, large buffers *might* be a good idea.
Yes indeed. Off-line processing... Is 2 GB/buffer on 32 bit systems an
accpetable limit? :-)
> >But how different is sending an event from passing an extra argument
> >to process()?
>
> in one of your descriptions, you talked about a plugin having "one
> input audio port and one output audio port".
Uhm, did I say that? Well, either I was thinking about some kind of
multichannel ports (bad idea, if I was), or actually meant event
ports. (My old design had just one event output - it looks different
in the new one. More on that later.)
> i think that having just
> one of each is a bad idea. but anyway, i don't think that any of this
> stuff should be passed as an argument to process() anyway. instead, we
> want something like this:
>
> struct plugin {
> int process (struct plugin *);
> .
> .
> .
> audio_port_t *audio_input;
> audio_port_t *audio_output;
> event_list_t event_list;
> };
>
> the engine can then manipulate
>
> plugin->audio_input[0]
>
> etc. before calling process(), passing it a pointer to "itself". think
> "closures" ...
Yes, I wasn't thinking about the argument passing method, but about
the event vs. [argument,variable,...] distinction.
Anyway, I think the closure should be kept as far away from the API
as possible. As I see it, it's a *private* struct, possibly with a
very simple public header struct.
> plugin->audio_input[0]
Won't work unless we set a fiked size on the arrays. Instead, the
plug-in should give the engine the size and address of the tables
during init. Perhaps even the addres of each audio port (or "stream
port", as I'd prefer to call it). Or, the info can be filled into
that public header struct, but that would mean eliminating the
pointer-per-port style.
What's the point with the "pointer-per-port style", BTW? Making the
plug-in process() code simpler by keeping all data for one channel in
one struct. That struct could be a bigger one, extending the public
port struct as needed - which cannot be done with tables. Don't know
if it matters much, or even is a good idea, but it seems nice to me
right now...
> >As for events and out-of-band, non real time transfer;
> >1) Set up you data buffer.
> >2) Send the event "Yo! There's a buffer for ya' at &buffer".
> >3) Stay off the buffer...
> >4) ...until you get the event "Ok, got it. (&buffer)"
>
> sounds like what i had in mind, with the addition of the explicit "OK,
> got it (&buffer)". i think this is the way to handle things like this.
That's how you do IPC with shared memory in the normal case, only
there you send the "events" as true real time messages. We can't do
that (except with clients - where it's possible to get the extreme
context switching rates back if you really want to), so we buffer up
as many events, notifications and requests as possible, and send the
whole "transaction" off when our cycle ends. Makes sense to me for
the kind of systems we're dealing with, as it makes it possible to
control the worst case latency for the real time stuff, while
worsening the average latency a bit.
> >The reason why I decided on the qm allocator with heaps for the
> >events is that it allows quick allocation of lots of small blocks
> >without fragmentation. It also replaces deallocation with garbage
> >collection in a very low cost way. (When a qm_heap_t is replaces a
> >buffer, the reference to the old one goes where the data will be
> >used, and is flushed from there.)
>
> at the moment, i am preferring an implementation that uses fixed event
> sizes, since events are just "differences that make a difference".
How big?
> you
> don't need a heap or a real allocator at all - you use a pool system
> identical to the incredibly efficient one used in the kernel. it took
> me a while to figure out how it worked, and it probably dates back to
> the 60's, but its really cool, and really fast. it looks like a
> freelist, but it isn't, really.
>
> assumption: every object to be allocated has a pointer to another
> object of the same kind that is usable when the object is
> "deallocated".
>
> Pool setup:
>
> struct object_pool {
> object_t *objs;
> object_t *next_free;
> };
>
> 1) allocate a pool of the objects:
>
> object_pool.objs = (object_t) malloc (sizeof(object_t) * POOL_SIZE);
>
> 2) connect each "next" pointer to make a free list
>
> for (i = 0; i < POOL_SIZE - 1; i++) {
> object_pool.objs[i]->next = &object_pool.objs[i+1];
> }
>
> 3) mark the top of the freelist
>
> object_pool.next_free = object_pool.objs[i];
>
> object allocation:
>
> object_t *
> alloc_object ()
> {
> object_t *obj;
>
> obj = object_pool.next_free;
> object_pool.next_free = obj->next;
> /* for safety, do: obj->next = 0; */
> return obj;
> }
>
> object deallocation:
>
> void
> dealloc_object ()
> {
> obj->next = object_pool.next_free;
> object_pool.next_free = obj;
> }
>
> this is not all correct, and it does handle OOM conditions. however,
> you'll find this basic structure all over the kernel. its beautiful,
> really beautiful.
Yep. :-) That's exactly what I had in mind for audio buffers. Cache
optimization, speed and simplicity at the same time. BTW, MidiShare
uses a system of that kind, IIRC.
> >As soon as the global heap of buffers is turned into a freelist, we
> >get search and splitt overhead, risk of complicated memory leak bugs,
> >deallocation overhead, and most importantly; _memory fragmentation_.
>
> nope. not if the objects are all the same size.
Well, I was thinking about allocation of data buffers, not the
events themselves. Event memory management is not a problem, not even
with dynamic size with a sane size limit.
> i am now waiting for your obvious explanation of why events cannot all
> be the same size, and then why all buffers cannot be the same size,
Does an event have one or two arguments? Or three? Should there be
room for doubles? Is it safe to split too big events into multiple
events, with respect to ordering of events with the same timestamp?
(I think they should come in the order they're sent, but what if
someone else is sending the same kind of events to the same place?)
I was thinking fixed size from the very beginning, but it's more of a
"simple solution for now" than anything else IMO. I don't like being
forced into cludges to work around legacy design limitations...
> since we have already talked about a scheme in which the engine
> decides the optimum buffer size based on plugin requirements.
That's an optimization for the normal case; constant rate streaming.
Not a generic solution for all kinds of real time streaming.
> furthermore, since changing buffer size is a pretty drastic thing to
> do to the system (many small details to handle if this ever happens),
> it seems not unreasonable to just set up a new buffer pool, and maybe
> discard the old one, though this seems harder.
In real time...? You could discard the old pool *before* setting up
the new one, but that brings other problems with it... And it
certainly doesn't work for dynamic rate streaming.
> so i see us having 2 pools of (differently) fixed sized objects,
> events and buffers, using this kernel-style allocation mechanism.
That would be nice and simple, and I used to like it once upon a
time. I still like it where it does tho job nicely, and no extra
flexibility is needed, but I've changed my mind about more generic
systems.
> and of course, note that since only the engine allocates and
> deallocates from both pools, no locks are necessary.
How do you send events to the engine from other threads in that case?
> >> this has its problems, not least of which is
> >> atomicity of the contents of the memory. if the plugin has been told
> >> that, say, a string value has changed, but then the string changes
> >> again before/while its looking at it, this is, uhm, not good :)
> >
> >If you tell someone that you put a ladder in the right place, you
> >don't move it away just when he's about to step on it, do you? :-)
>
> sure. the solution is actually pretty obvious:
>
> void *get_string_pointer (const char *str)
> {
> ... lookup existing strings, perhaps via hash ...
> if found, return pointer to string
> if not found, alloc "global" memory, store string there,
> and return pointer.
> }
>
> then, when you want to say "input file just changed to XXX", you
> actually say "input file just changed (value at <address>)", and now
> we now that the address will always hold that value. if necessary, we
> can do reference counting too, to ensure that we can periodically
> sweep away stale values.
Sounds way too expensive, and more importantly undeterministic, for a
real time system. And unless the lookup really finds a string most of
the time, you might as well copy the data right away. I think a
sensible, well defined protocol can eliminate the "ladder removal"
problem and optimize away most of the reference counting overhead.
But the fragmentation problem is still not solved...
//David
·A·U·D·I·A·L·I·T·Y· P r o f e s s i o n a l L i n u x A u d i o
- - ------------------------------------------------------------- - -
·Rock Solid David Olofson:
·Low Latency www.angelfire.com/or/audiality ·Audio Hacker
·Plug-Ins audiality_AT_swipnet.se ·Linux Advocate
·Open Source ·Singer/Composer
This archive was generated by hypermail 2b28 : pe maalis 10 2000 - 07:27:59 EST