Several classes have common active_ and runner_ variables and
stop/start/active routines (such as reader and dispatch).
Create one common class for these to make the interface cleaner
This is the third way of dealing with memory ownership/deallocation.
Application "lends" the memory for kvzRTP's use and when SCD has finished
processing the transaction associated with this memory, it will call the
deallocation hook provided by the application to release the memory.
This makes, for example, custom allocators possible where wrapping the
memory inside a unique_ptr is not suitable and creating copies is also
not acceptable.
This is a separate thread running in the background responsible for
executing system calls (mainly sending UDP packets).
This commit divides the sending into frontend and backend:
- Frontend packetizes the media into "transactions" which are then
pushed to backend's task queue
- Backend executes these transactions FIFO-style and pushes executed
transactions back to frame queue for reuse
Frontend is the part of sending that executes in application's context
and backend (system call) happens in a background thread.
Ideally frontend and backend would be run on separate physical cores.
This change made sending significantly faster (from 650 MB/s to 720 MB/s)
and cut down the delay experienced by application from 315us to 45us
for large HEVC chunks (177 kB)
Using system call dispatcher complicates the frame queue design because
we can no longer store f.ex. NAL and FU headers to caller's stack (as
SCD doesn't have access to that stack).
We must create transaction object that contains all necessary
information related to one media frames (all Vectored I/O buffers,
RTP headers and outgoing address).
This model works both with and without SCD and is much cleaner than the
previous implementation. It also makes a more clear distinction between
the frontend and backend of sending operationg by creating
a clear producer/consumer model.
One problem that has arisen is memory deallocation and ownership in
general: when SCD is used, it owns the memory given to kvzRTP by
push_frame() BUT it doens't know what kind of memory it is so it doesn't
know how to deallocate it. Some kind of deallocation scheme must be
implemented because right now the library leaks a lot of memory.
The payload format must be known when creating the Connection object
because Connection owns the frame queue and frame queue needs the format
in order to allocate correct media-specific headers for transactions
This is a generic way of sending multiple packets with one system call.
It takes a pointer to vecio_buf (mmsghdr on Linux, TRANSMIT_PACKETS_ELEMENT
on Windows) and vecio_buf length and calls either sendmmsg(2) or TransmitPackets()
The API didn't change much, if user wishes to use HEVC slices
(and thus preserve the state between push_frame() calls), he must
call the push_frame() with RTP_SLICE and RTP_MORE flags, like this:
push_frame(conn, data, 123, RTP_SLICE | RTP_MORE);
push_frame(conn, data, 456, RTP_SLICE | RTP_MORE);
push_frame(conn, data, 789, RTP_SLICE | RTP_MORE);
push_frame(conn, data, 100, RTP_SLICE);
RTP_MORE preserves the state between push_frame() calls and when the
last slice is given to kvzRTP, the RTP_MORE flags must be removed.
This flushes the frame queue and deinitializes it.
Some of the more complex relocations still don't work, invalid and
duplicate packets wreck havoc and frame reallocation is missing but
it's already able to receiver stream with very large packets gracefully.
Separate them to different files and make it configurable which to use.
By default the normal receiver is used and if use gives __RTP_USE_OPTIMISTIC_RECEIVER__
then the optimistic receiver is started.
Small frames work, larger frames are sometimes dropped. This is supposed
to reduce the amount of copying but the performance will degrade as
network load increases
Probation zone lives below the actual payload and can be used as a
temporary storage for fragments that cannot be relocated.
It's part of the larger memory block so the fragments that are copied
to probation zone are spatially very close to their actual place in
the array making relocation faster
More flag made it possible to give more data to the frame queue
(and thus return from __push_hevc_frame()).
This is not actually possible because the fragment headers are stored
on the stack. That is why postponing the frame queue flush is not a good idea.