MP4 Writer Sample 2
The FileWriterSample2 demonstrates how to record pre-encoded media to an MP4 file by manually pushing individual frames through a VirtualNetworkSource. Unlike MP4 Writer Sample 1 which connects an existing network source to a file sink, this sample shows how to inject your own media data into the pipeline — useful when generating or processing media in application code.
Overview
The FileWriterSample2 class performs the following:
- Loads pre-encoded H.264 video and AAC audio bitstreams from embedded resources
- Parses H.264 codec configuration (resolution, framerate, codec private data)
- Creates a VirtualNetworkSource and registers video and audio streams
- Connects the source and an IsoSink via MediaSession
- Pushes video and audio frames with synchronized timestamps on a background task
Setting Up the Pipeline
Codec Configuration
VAST.Codecs.H264.ConfigurationParser h264Parser = new Codecs.H264.ConfigurationParser();
h264Parser.Parse(videoBuffer);
VAST.Common.MediaType videoMediaType = new VAST.Common.MediaType
{
ContentType = VAST.Common.ContentType.Video,
CodecId = VAST.Common.Codec.H264,
Bitrate = 800000,
Width = h264Parser.Width,
Height = h264Parser.Height,
Framerate = h264Parser.FrameRate,
PixelAspectRatio = h264Parser.AspectRatio,
};
VAST.Codecs.H264.ConfigurationParser.GenerateCodecPrivateData(videoMediaType, videoBuffer);
VAST.Common.MediaType audioMediaType = new VAST.Common.MediaType
{
ContentType = VAST.Common.ContentType.Audio,
CodecId = VAST.Common.Codec.AAC,
SampleRate = 44100,
Channels = 2,
Bitrate = 128000,
};
VAST.Codecs.AAC.ConfigurationParser.GenerateCodecPrivateData(audioMediaType);
The H.264 bitstream is parsed to extract SPS/PPS parameters and codec private data. AAC codec private data (AudioSpecificConfig) is generated from the sample rate and channel count.
VirtualNetworkSource and MediaSession
VAST.Network.VirtualNetworkSource source = new VAST.Network.VirtualNetworkSource();
int videoStreamIndex = source.AddStream(videoMediaType);
int audioStreamIndex = source.AddStream(audioMediaType);
VAST.Media.IMediaSink fileSink = new VAST.File.ISO.IsoSink();
fileSink.Uri = filePath;
writerSession = new VAST.Media.MediaSession();
writerSession.AddSource(source);
writerSession.AddSink(fileSink);
writerSession.Start();
VirtualNetworkSource is a virtual source that allows programmatic injection of media samples. AddStream registers each media type and returns the stream index used when pushing samples. The MediaSession connects the source to the IsoSink and manages stream setup and media routing automatically.
Pushing Video Frames
int bitstreamPosition = h264BitstreamPosition + 4;
int startCodeSize = 0;
int nextNalPosition = 0;
while ((nextNalPosition = VAST.Codecs.H264.ConfigurationParser.FindNextStartCode(
videoBuffer, bitstreamPosition,
videoBuffer.Length - bitstreamPosition, out startCodeSize)) >= 0)
{
VAST.Codecs.H264.NalUnitTypes nalUnit =
(Codecs.H264.NalUnitTypes)(videoBuffer[nextNalPosition + startCodeSize] & 0x1F);
if (nalUnit == Codecs.H264.NalUnitTypes.AccessUnitDelimiter)
{
break;
}
else
{
bitstreamPosition = nextNalPosition + startCodeSize;
}
}
source.PushMedia(videoStreamIndex, videoBuffer,
h264BitstreamPosition, frameSize, videoFileTime, videoFileTime);
Video frames are extracted by scanning for Access Unit Delimiter NAL units. Each frame is pushed via PushMedia with the stream index, buffer, offset, size, PTS, and DTS — all in 100-nanosecond units:
long videoFileTime = videoFrameCount * 10000000L
* videoMediaType.Framerate.Den / videoMediaType.Framerate.Num;
Pushing Audio Frames
while (audioFileTime <= videoFileTime)
{
int frameSize = ((audioBuffer[audioBitstreamPosition + 3] & 0x03) << 11)
| (audioBuffer[audioBitstreamPosition + 4] << 3)
| ((audioBuffer[audioBitstreamPosition + 5] & 0xE0) >> 5);
source.PushMedia(audioStreamIndex, audioBuffer,
audioBitstreamPosition + 7, frameSize - 7, audioFileTime, audioFileTime);
audioBitstreamPosition += frameSize;
audioFrameCount++;
audioFileTime = audioFrameCount * 10000000L * 1024 / audioMediaType.SampleRate;
}
AAC frames are extracted from an ADTS bitstream. The frame size is read from the ADTS header and the 7-byte header is stripped before pushing.
Audio/Video Synchronization
Video and audio samples must be pushed with approximately matching timestamps — there must be no large gap between the two streams. Because video and audio frame durations differ (e.g. ~33 ms for 30fps video vs ~23 ms for 44.1 kHz AAC), the audio loop catches up to the current video timestamp after each video frame is pushed.
Difference from MP4 Writer Sample 1
| Writer Sample 1 | Writer Sample 2 | |
|---|---|---|
| Source | Network source via SourceFactory |
VirtualNetworkSource with manual data injection |
| Media data | Received from a remote server | Pre-encoded bitstreams pushed from application code |
| Timestamp management | Handled by the source | Calculated and assigned manually |
| Use case | Recording an existing stream | Writing application-generated or processed media to file |
See Also
- Sample Applications — overview of all demo projects
- .NET Server Demo — parent page with setup instructions, license key configuration, and access URL reference
- MP4 Writer Sample 1 — simple recording with MediaSession