MP4 Writer Sample 2

The FileWriterSample2 demonstrates how to record pre-encoded media to an MP4 file by manually pushing individual frames through a VirtualNetworkSource. Unlike MP4 Writer Sample 1 which connects an existing network source to a file sink, this sample shows how to inject your own media data into the pipeline — useful when generating or processing media in application code.

Overview

The FileWriterSample2 class performs the following:

Loads pre-encoded H.264 video and AAC audio bitstreams from embedded resources
Parses H.264 codec configuration (resolution, framerate, codec private data)
Creates a VirtualNetworkSource and registers video and audio streams
Connects the source and an IsoSink via MediaSession
Pushes video and audio frames with synchronized timestamps on a background task

Setting Up the Pipeline

Codec Configuration

VAST.Codecs.H264.ConfigurationParser h264Parser = new Codecs.H264.ConfigurationParser();
h264Parser.Parse(videoBuffer);

VAST.Common.MediaType videoMediaType = new VAST.Common.MediaType
{
    ContentType = VAST.Common.ContentType.Video,
    CodecId = VAST.Common.Codec.H264,
    Bitrate = 800000,
    Width = h264Parser.Width,
    Height = h264Parser.Height,
    Framerate = h264Parser.FrameRate,
    PixelAspectRatio = h264Parser.AspectRatio,
};

VAST.Codecs.H264.ConfigurationParser.GenerateCodecPrivateData(videoMediaType, videoBuffer);

VAST.Common.MediaType audioMediaType = new VAST.Common.MediaType
{
    ContentType = VAST.Common.ContentType.Audio,
    CodecId = VAST.Common.Codec.AAC,
    SampleRate = 44100,
    Channels = 2,
    Bitrate = 128000,
};

VAST.Codecs.AAC.ConfigurationParser.GenerateCodecPrivateData(audioMediaType);

The H.264 bitstream is parsed to extract SPS/PPS parameters and codec private data. AAC codec private data (AudioSpecificConfig) is generated from the sample rate and channel count.

VirtualNetworkSource and MediaSession

VAST.Network.VirtualNetworkSource source = new VAST.Network.VirtualNetworkSource();

int videoStreamIndex = source.AddStream(videoMediaType);
int audioStreamIndex = source.AddStream(audioMediaType);

VAST.Media.IMediaSink fileSink = new VAST.File.ISO.IsoSink();
fileSink.Uri = filePath;

writerSession = new VAST.Media.MediaSession();
writerSession.AddSource(source);
writerSession.AddSink(fileSink);
writerSession.Start();

VirtualNetworkSource is a virtual source that allows programmatic injection of media samples. AddStream registers each media type and returns the stream index used when pushing samples. The MediaSession connects the source to the IsoSink and manages stream setup and media routing automatically.

Pushing Video Frames

int bitstreamPosition = h264BitstreamPosition + 4;
int startCodeSize = 0;
int nextNalPosition = 0;

while ((nextNalPosition = VAST.Codecs.H264.ConfigurationParser.FindNextStartCode(
    videoBuffer, bitstreamPosition,
    videoBuffer.Length - bitstreamPosition, out startCodeSize)) >= 0)
{
    VAST.Codecs.H264.NalUnitTypes nalUnit =
        (Codecs.H264.NalUnitTypes)(videoBuffer[nextNalPosition + startCodeSize] & 0x1F);

    if (nalUnit == Codecs.H264.NalUnitTypes.AccessUnitDelimiter)
    {
        break;
    }
    else
    {
        bitstreamPosition = nextNalPosition + startCodeSize;
    }
}

source.PushMedia(videoStreamIndex, videoBuffer,
    h264BitstreamPosition, frameSize, videoFileTime, videoFileTime);

Video frames are extracted by scanning for Access Unit Delimiter NAL units. Each frame is pushed via PushMedia with the stream index, buffer, offset, size, PTS, and DTS — all in 100-nanosecond units:

long videoFileTime = videoFrameCount * 10000000L
    * videoMediaType.Framerate.Den / videoMediaType.Framerate.Num;

Pushing Audio Frames

while (audioFileTime <= videoFileTime)
{
    int frameSize = ((audioBuffer[audioBitstreamPosition + 3] & 0x03) << 11)
        | (audioBuffer[audioBitstreamPosition + 4] << 3)
        | ((audioBuffer[audioBitstreamPosition + 5] & 0xE0) >> 5);

    source.PushMedia(audioStreamIndex, audioBuffer,
        audioBitstreamPosition + 7, frameSize - 7, audioFileTime, audioFileTime);

    audioBitstreamPosition += frameSize;
    audioFrameCount++;
    audioFileTime = audioFrameCount * 10000000L * 1024 / audioMediaType.SampleRate;
}

AAC frames are extracted from an ADTS bitstream. The frame size is read from the ADTS header and the 7-byte header is stripped before pushing.

Audio/Video Synchronization

Video and audio samples must be pushed with approximately matching timestamps — there must be no large gap between the two streams. Because video and audio frame durations differ (e.g. ~33 ms for 30fps video vs ~23 ms for 44.1 kHz AAC), the audio loop catches up to the current video timestamp after each video frame is pushed.

Difference from MP4 Writer Sample 1

	Writer Sample 1	Writer Sample 2
Source	Network source via `SourceFactory`	`VirtualNetworkSource` with manual data injection
Media data	Received from a remote server	Pre-encoded bitstreams pushed from application code
Timestamp management	Handled by the source	Calculated and assigned manually
Use case	Recording an existing stream	Writing application-generated or processed media to file

Table of Contents