Kouji Matsui
Posted on April 23, 2022
Introduction
Have you ever wanted to take a video captured image and add your own custom operations to it? Have you ever looked for a library that makes it easy to implement such custom operations?
Applications that directly manipulate physical devices, such as video capture devices, create restrictions on the libraries that can be used for such operations. .NET, you need to attach native interoperability libraries. This is not a problem for hobby projects that you use only by yourself, but when you get to the stage of creating a package of applications and libraries for someone else to use, a problem emerges. For example:
- Native libraries need to be compiled in C++ or similar for each of their platforms. For Windows, you may need to have separate ones for 32-bit and 64-bit, and possibly for the ARM platform in the near future. For Linux, the number of target platforms will further increase.
- Packaging native libraries requires special knowledge. There may be a need to place library files in the proper locations and possibly set up custom code to load them dynamically.
In Jan 2022, my client's requirements drove the need for a camera device on Windows to take still images and perform image processing and image recognition. My initial assumption was to use a USB-connected camera (a device that supports USB Video Class) to take the still image. I wrote "still images," but in reality we also need to preview the images, so they need to be handled as video stream.
If we just want to record video, there are excellent libraries, applications, and UI controls for recording video. But at the same time, we needed such a library that would be customizable to do its own processing in the background for a single still image during the video preview, and that would have an easy and simple interface and clear packaging issues. (Sometimes referred to as "A frame grabber" as a general term.)
If we don't have it, let's make it. This is how FlashCap was born.
Note: FlashCap
was born out of work, but is a fully OSS project. License follows Apache-v2
.
Goals
My goal was to solve the following problems.
- The interface structure should be easy enough for anyone to write code.
- Extensibility to allow very low-level operations if necessary.
- Support a wide range of platforms.
- The higher the .NET compatibility, the better.
- Not dependent on other native and/or managed libraries as much as possible.
For 3, as of the official version (1.0.0):
- Windows (DirectShow API)
- Windows (Video For Windows API)
- Linux (V4L2 API)
are now supported. (Perhaps it can't be called multi-platform unless it supports more environments ;) For now, my personal interest is in the Android NDK.)
In addition, the following PCs have been verified:
- Windows (x86, x64)
- Linux (i686, x86_64, aarch64, armv7l, mipsel)
I have verified with various cameras and video capture devices. Not all combinations have been covered, but we believe this is sufficient for a first version. For a detailed list, please refer to the repository documentation.
4 also supports the current major versions of .NET (.NET 6,5, Core 3, Core 2, Framework 4,3, and Standard). If you have ever struggled with package dependencies, you will find comfort in the fact that it is so comprehensive.
Usage
So how "easy" is it to write code? First, install the FlashCap package from NuGet. An example code is shown below. It lists the video capture devices and what characteristics (resolution, FPS, format) each device has:
using FlashCap;
// Capture device enumeration:
var devices = new CaptureDevices();
foreach (var descriptor in devices.EnumerateDescriptors())
{
// "Logicool Webcam C930e: DirectShow device, Characteristics=34"
// "Default: VideoForWindows default, Characteristics=1"
Console.WriteLine(descriptor);
foreach (var characteristics in descriptor.Characteristics)
{
// "1920x1080 [JPEG, 30fps]"
// "640x480 [YUYV, 60fps]"
Console.WriteLine(characteristics);
}
}
There is no need to determine the operating environment separately for Windows and Linux. It will automatically determine the environment internally and use the appropriate API.
Once you have narrowed down the device and its characteristics, all you have to do is open the device and start capturing:
// Open a device with a video characteristics:
var descriptor0 = devices.EnumerateDescriptors().ElementAt(0);
using var device = await descriptor0.OpenAsync(
descriptor0.Characteristics[0],
async bufferScope =>
{
// Captured into a pixel buffer from an argument.
// Get image data (Maybe DIB/Jpeg/PNG):
byte[] imageData = bufferScope.Buffer.ExtractImage();
// Anything use of it...
var ms = new MemoryStream(imageData);
var bitmap = System.Drawing.Bitmap.FromStream(ms);
// ...
});
// Start processing:
device.Start();
// ...
// Stop processing:
device.Stop();
This is where the handler for the lambda expression block specified in the OpenAsync()
argument is called each time a capture is made. From the bufferScope.Buffer
argument (called the pixel buffer), ExtractImage()
will yield a byte[]
representing the image data. As in this code example, you can pass it to System.Drawing.Bitmap
to generate a bitmap or output it directly to a file, and that completes the capture process.
The image data can be in the following format:
- 'RGB DIB': a bitmap file format such as the common
foo.bmp
. The format can vary from 32-bit ARGB to 8-bit palette, but current capture device specifications are almost always 24-bit or 32-bit. - 'JPEG': JPEG format. Most of the time it is actually MJPEG (Motion JPEG), but since it is actually identical to JPEG, it can be treated as JPEG.
If you are familiar with the video capture situation, you may wonder what happens to the 'YUV' format.' The 'YUV' format is supported by many capture devices, but is a minor player in image data. Many bitmap decoders cannot decode the 'YUV' format. So, FlashCap
will transcode
'YUV' to 'RGB DIB' by default. Of course, if you need raw image data, you can disable transcoding.
By the way, FlashCap
handles video. It is not a still picture. Therefore, the above handler is called each time a frame (one still image) of a video is obtained. So, if you want a snapshot, write the code to save one still image at that timing. For example, you could always store the last image in the member field, and then save a reference to it on a click of a button UI, etc.
Topic
It should also be noted that the above handlers are called from the worker thread; WPF, Windows Forms, and other UI frameworks do not allow manipulation of the UI from the worker thread. For example, you may want to preview the resulting image data. Each of them has its own specific method. In the case of WPF, you can use a Dispatcher
and write the following:
// Image previewImage;
// here executed in a worker thread
byte[] imageData = bufferScope.Buffer.ExtractImage();
var bitmap = new BitmapImage();
bitmap.BeginInit();
bitmap.CacheOption = BitmapCacheOption.OnLoad;
bitmap.StreamSource = new MemoryStream(imageData);
bitmap.EndInit();
bitmap.Freeze(); // Freeze is necessary because it was generated in a worker thread
// Execute in UI thread
previewImage.Dispatcher.BeginInvoke(
new Action((() => previewImage.Source = bitmap)));
Overall implementation examples are available in Windows Forms, WPF and Avalonia respectively. Please refer to them.
After touching it for a while, you may be concerned about performance. Image data is very large data, and since it is video, it must be processed dozens of times per second. By default, FlashCap
ignores the next image data during processing. This means that "frame dropping" occurs. To solve this, the implementation of the handler must be as fast as possible.
For example, image data can be stored once in a queue and processed in a multi-threaded manner. Such a scenario is also supported by the standard functionality of FlashCap
. Add a parameter to the OpenAsync()
argument:
// Open a device with a video characteristics:
var descriptor0 = devices.EnumerateDescriptors().ElementAt(0);
using var device = await descriptor0.OpenAsync(
descriptor0.Characteristics[0],
true,
true, // <-- Enable multithreading (Scattering)
10, // <-- Maximum queuing pixel buffers
async bufferScope =>
{
// ...
});
Specifies the maximum amount of image data queue, and whether it should be multithreaded or not. (This is called Scattering
.) It is possible to remain single-threaded and increase only the maximum queue amount. Even with single threading, if the handler processing load is unstable, frame dropping may be avoided by increasing the maximum queue size. Note that in both cases, frame dropping will occur if the image data exceeds the maximum queue size.
Options are also available for those who wish to optimize processing further. If you implement your own "frame processor," you can handle almost any raw image data directly. The interface is simple but delicate and is not likely to be used in general applications. See the documentation for more details.
Conclusion
I have created a FlashCap
library that completely conceals the video capture device, yet allows programmable device coverage and format selection, has a simple and easy interface, is multi-platform, and eliminates dependencies on external libraries such as native libraries. .NET runtime version requirements as broadly as possible, and also provides interface extensibility for faster capture.
You can use FlashCap
to easily obtain image data with a webcam in front of you, or to implement features that are essential for sophisticated applications, such as processing the resulting image while displaying a preview.
I hope this will be helpful to everyone.
Posted on April 23, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.