Saturday, May 31, 2008

Video Scene Detection with DirectShow.NET

For some time I've been working on a video-related personal project. I'm using the fantastic DirectShow .NET library, which provides a nice C# interface to Microsoft's DirectShow C++ API. At one point some folks on the DS .NET forums asked about the scene detection algorithm I referenced in one of my forum posts. I promised to follow up with some sample code and explanations and--finally--here they are.

I've created a sample solution to demonstrate my scene detection algorithm. It's based on the DxScan sample available with other DS .NET samples on the DS .NET download page. My algorithm is not yet production code but has proven very reliable in my own testing. It is 100% accurate against my test video library, which is 600 minutes of actual sports video with 1,800 scene changes (including both night and daytime events) plus several short test videos created explicitly to strain the algorithm.

At a high level, scene detection involves the following steps:

  1. Randomly select 2,000 of the RGB values composing a single video frame. These are the values on which we'll perform a longitudinal (or cross-frame) analysis to detect scene changes for the entire duration of the video.
  2. Analyze the current frame:
    1. Calculate the average RGB value for the current frame. If the RGB values are unusually low or high we're detecting scenes shot in bright or dim light conditions and will need to raise or lower our scene detection thresholds accordingly.
    2. Perform an XOR diff between the RGB values in the previous and current frames. The XOR diff amplifies minor differences between frames (vs a simple integer difference) which improves detection of scene changes involving similar scenes as well as detection in low-light conditions where we tend to be dealing with lower RGB values.
    3. Calculate the average RGB difference between the current and previous frames. In other words, add up the XOR diff values from step 2.2 and divide by the number of sample frames.
    4. Calculate the change in average RGB difference between the current and previous frames. This is a bit tricky to understand, but it's critical to achieving a high level of accuracy when differentiating between new scenes and random noise (such as high-motion close-ups or quick pans/zooms). If the previous frame's change in average RGB difference is above a defined, positive threshold (normalized for light conditions detected in step 2.1) and the current frame's change in average RGB difference is below a defined, negative threshold, then the previous frame is flagged as a scene change. In simple terms, we're taking advantage of the fact that scene changes nearly always result in a two-frame spike/crash in frame-to-frame differences; while pans, zooms, and high-motion close-ups result in a gradual ramp-up/ramp-down in frame-to-frame differences.
    5. Advance to the next frame and repeat step 2.

I'll try to expand and clarify the above steps when I have time, but for now you'll have to read the code if you need to understand the algorithm in more detail. The only limitations in the current implementation (that I'm aware of) are the following:

  1. Dropped frames are interpreted as scene changes. This issue can be minimized in most applications by choosing a minimum scene duration and discarding new-scene events fired by the SceneDetector inside the minimum-duration window.
  2. Scene transition effects (fades, dissolves, etc.) are not supported and scene changes involving such effects are not detected.

If you encounter any other issues with the algorithm, I'd love the opportunity to see and analyze the video that broke it!

29 comments:

Praddy Always Ready said...

Hi Ashley, Really good useful work on Video Scene change detaction.Nice things to start with it for me, as a bigginer in this domain.

thnx.

AT said...

Thanks for the feedback and kind words!

amit said...

Hi,

is their any default threshold values? for frames or programer need to give his own threshold?


regards
Amit

amit said...

Hi,

Please let me know this method will it works fine for Gray scale 24 bit images.?

AT said...

@amit - Yes, default threshold values are provided in the code and they are also automatically adjusted based on the detected range of pixel variation in the video (from low or high light levels, for example).

It should work with black and white video (or grayscale), but it may be overly sensitive to scene changes. I haven't tested specifically with B&W so I'd be interested in your feedback if you try it out.

amit said...

Hi AShley Tate,

I tried to download code from link given. But in the folder no files exists. Please can u give me correct code link for download.

Regards
Amit

AT said...

@amit - I'm not sure exactly what you mean, but did you try simply clicking on the download link? The file is hosted on Amazon's S3 service so there are no "folders" to look in. I checked it this morning and the file is definitely there.

Anonymous said...

Hi, Ashley

Great work! Scene detection is very accurate for the sample videos I tested. Is there any way to make it work with .wmv files?

thanks
Charles

AT said...

@Charles - Thanks! The scene detector actually works fine with WMV files. That's primarily what I use it for myself. I believe your problem is that the DxScan sample from the DirectShow.NET project (which I modified to create the DxScanScenes sample) does not set up its filter graph properly to support WMV files. Other folks have asked about this on the DS.NET forums so you might want to try searching there:

https://sourceforge.net/forum/forum.php?forum_id=460697

amit said...

Hi Ashley Tate ,

Now I have taken 2000 random RGB values for previous frame and current frame as below(delph with pascal)
first_frame: array[0..1999] of integer;
next_frame: array[0..1999] of Integer;
result_array1 : array[0..1999] of double;
avrg_clr,prevavrg_clr : Double ;

Then I have calculated Xor of previous and current frame RGB values and average of them as below

for i := 0 to 2000 do
begin
result_array1[i] := first_frame[i] xor next_frame[i];
end;


avrg_clr := 0;

for i := 0 to 2000 do
begin
avrg_clr := avrg_clr + result_array1[i];
end;
avrg_clr :=avrg_clr / 2000;


After this I am not getting How to define threshold values. Even I tried to compile your code but it is giving compilation error.

Please guide me how to take threshold values and how to proceed further.

Thanking you.
Regards
Amit

amit said...

Hi Ashley Tate,

Actually I have no idea about C# , please can you give me vc++ or delphi(pascal) code snippet for scene detection core algoritham.

Regards
Amit

amit said...

Hi Ashley Tate ,

I don't have DirectShow .NET library. Without this I want to work for scene detection. Is it possible in delphi without directshow library.

regards
Amit

amit said...

Hi,

sorry for disturbing you again, I am using sequence frames, ie I am converting video file into frames first. Then I think it is not necessary of Directshowlib.

Regards
Amit

AT said...

@amit: Sorry, I don't have time to convert the sample to C++ for you!

I'm not sure what you mean by "converting file to frames first". You could certainly apply the same algorithm to a sequence of bitmaps if that's what you mean, but the code would look very different.

amit said...

Hi Ashley Tate,

I am converting video into sequence of frames(ie sequence of bitmaps). Then I think it is not necessary to use directshow library.SceneChanged() function in "averageRgbdiffDetectionStrategy.cs", I think it is the core algoritham?

Regards
Amit

amit said...

Hi Ashley Tate<

I am doing scene dtection for black and white sequence of bitmaps, for my academic project.I started coding in delphi(pascal).

I have conveted your algoritham to delphi as given below. Here I have combined SceneChanged() and WasPrevNewScene() functions.

It is not corectly detecting the scene changes. Please check my code and Please help me.

When I load new frame I will call the below function.

//Global VARIABLES declaration
var
//amit
prev_frame: array[0..1999] of Integer;
current_frame: array[0..1999] of Integer;

avrg_clr,prevavrg_clr : Double ;
prevAvgDiffChange,prevAvgDiff,prevAvgRgbLevel : double;
prevSampleTime : Double;

//CONSTANTS
BaselineRgbDiffThreshold = 21;
BaselineRgbLevel = 90;
MinRgbDiffThreshold = 5;
MaxRgbDiffThreshold = 45.00;
BaselineUncertainty = 5;
ThresholdToLevelRatio =0.395 ;
DefaultSampleSize = 2000;






procedure TMainFrm.comparescene1Click(Sender: TObject);
var
lastLocation,sumDiffs,sumRgbLevels,avgRgbLevel,
avgDiff,avgDiffChange: Double;

prevRgbLevelVariance,prevDiffThreshold : Double;
i,x,y : integer;
begin

//take Random RGb values for current frame
for i := 1 to 2000 do
begin
x :=Random(LEADImage1.BitmapWidth-1);
y := Random(LEADImage1.BitmapHeight-1);
current_frame[i] := LEADImage1.Pixel[x,y]; //getting RGB values randomely from image and saving into an array
end;


//start of SceneChanged() function

lastLocation := 0;
sumDiffs := 0;
sumRgbLevels := 0;
for i := 0 to 2000 do
begin
sumDiffs := sumDiffs + (prev_frame[i] xor current_frame[i]);
sumRgbLevels := sumRgbLevels + current_frame[i];
prev_frame[i] := current_frame[i];
end;

avgRgbLevel := sumRgbLevels / 2000;
avgDiff := sumDiffs / 2000;
avgDiffChange := avgDiff - prevAvgDiff;



//WasPrevNewScene() function equivalent
//start
prevRgbLevelVariance := prevAvgRgbLevel - BaselineRgbLevel;
prevDiffThreshold := (prevRgbLevelVariance * ThresholdToLevelRatio) + BaselineRgbDiffThreshold ;

{if MaxRgbDiffThreshold < prevDiffThreshold then
prevDiffThreshold := MaxRgbDiffThreshold
else
prevDiffThreshold := prevDiffThreshold; }

if MaxRgbDiffThreshold > prevDiffThreshold then
prevDiffThreshold := MaxRgbDiffThreshold
else
prevDiffThreshold := prevDiffThreshold;


if (prevAvgDiffChange > prevDiffThreshold) and (avgDiffChange < (-prevAvgDiffChange * 0.5)) then
ShowMessage('scene changed');
//end;

prevAvgRgbLevel := avgRgbLevel;
prevAvgDiffChange := avgDiffChange;
prevAvgDiff := avgDiff;

//end of SceneChanged() function



end;




Thanking you

Regards
Amit

amit said...

Hi Ashley,

In the scenechanged() function I am unable to translate following line of code.
OnDataGenerated(
new SampleDataEventArgs(sampleTime, new object[] {avgDiff, avgDiffChange, avgRgbLevel},
new string[] {"AvgRgbDiff", "AvgRgbDiffChange", "AvgRgbLevel"}));

What this portion of code will do please tell me. Please give me exe of your code,so that I can check with black and white avi file.

Thanking you.
Regards
Amit

AT said...

@amit: The OnDataGenerated event is fired so that calculated data concerning the current frame can be logged for analysis purposes. It is not critical to detecting scene changes. You should be able to build the sample project if you need an exe. Many other folks have done so and I haven't modified it or uploaded a new version. Post the build error if you want help building.

amit said...

Hi Ashley Tate,

I am happy to say that your scene detection algorithm working fine with gray scale images also. I Did it in Delphi.

It is detecting fine when suddenly scene changes,But gradually scene is changing means it is not detecting. For example in a first frame of the scene 2 objects are standing in the center of frame, after some frames these objects gradually move right side ,at the same time 3rd object is entering in the same sceen, "I am not unable to detect".

Is their any way for solving this issue.

Thanking you for helping me.

Regards
Amit

AT said...

@amit: Glad to hear it's working for you.

It sounds like what isn't working for you is actually motion detection, not scene detection. Motion detection requires a completely different approach--basically finding "edges" of objects in the scene and determining when the edges move.

amit said...

Hi Ashley Tate,

Can you please give idea about, how to detect motion? For my own interest.

Thanking you.


Regards
Amit

AT said...

@amit: I can't explain edge detection here, but you should be able to find information elsewhere!

Nooneelse said...

why do you only checking 2000 pixels per frame not the full frame? for performance?

Nooneelse said...

why do you only checking 2000 pixels per frame not the full frame? for performance?

AT said...

@nooneelse: It's actually checking 2000 of the RGB values, not 2000 full pixels. This random sample size is sufficient to detect frame-to-frame variation with a 99% confidence interval and low margin of error.

Anonymous said...

The ZIP file does not actually contain any files. It has the full folder structure, but the files are missing. I tried unpacking with WinZip and 7Zip. Any thoughts?

AT said...

@Anonymous: Sure it contains files. Look in Samples\Editing\DxScanScenes. I'm not sure how the empty folders ended up in the file too, but they don't even show up when I view the file using WinZip.

Mark said...

Great stuff, Ashley. I've been working on something very similar for the last few years and your code is beating mine by about 5%! I notice you never use the BaselineUncertainty variable. Is that deliberate? Also, which values would you alter to make the detector more or less sensitive to shot changes?

AT said...

Thanks Mark. The algorithm went through a couple of iterations. BaselineUncertainty is just a left-over bit I missed when cleaning it up. It's been quite a while since I looked at this, but I've included below some background information and advice I wrote up for someone else who emailed me. Hope it helps.

====================
The constant values were primarily tuned from a sampling of sports videos (including indoor, outdoor, and nighttime shots) with a few "regular" home videos and some special test videos thrown into the mix. It's been a while since I worked in this code, but here are some comments which should help.

// RGB difference threshold which indicates a scene change when at baseline RGB level
private const double BaselineRgbDiffThreshold = 21;
private const double BaselineRgbLevel = 90;

These are the "average" difference and brightness levels we start with for all videos. As the RGB level (brightness; range is 0-255) increases above 90 the difference threshold is also increased above 21 to make scene detection less sensitive. As the RGB level decreases below 90 the difference threshold is also decreased to make scene detection more sensitive. This adjustment is made using the ThresholdToLevelRatio constant, which was arrived at through experimentation. Decreasing the BaselineRgbDiffThreshold constant would make scene detection more sensitive; increasing it, less sensitive.

// min and max thresholds based on the ranges found in real data
private const double MinRgbDiffThreshold = 5;
private const double MaxRgbDiffThreshold = 45;

These are just constraints for the adjusted difference threshold used for detection scene changes. The adjusted threshold (based on video brightness as described above) is never allowed to be greater than 45 or less than 5. Lowering MinRgbDiffThreshold would make scene detection more sensitive for dark videos. Lowering MaxRgbDiffThreshold would make scene detection more sensitive for bright videos. Raising either one would make scene detection less sensitive for dark or bright videos, respectively. So the key thing to take away is that multiple variables are involved in adjusting sensitivity, and which ones are involved depends on the video brightness level.

// amount to change diff threshold in relation to changes in RGB level (21/90, 21.395/91, etc.)
private const double ThresholdToLevelRatio = .395;

 
Header photo courtesy of: http://www.flickr.com/photos/tmartin/ / CC BY-NC 2.0