View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0002727KdenliveMLTpublic2012-09-09 02:062013-01-28 19:50
Assigned Toddennedy 
Platform64 bitOSUbuntuOS Version10.04
Product VersionRecent git 
Target VersionFixed in Version0.9.4 
Summary0002727: frei0r filters take huge performance penalty due to slow colorspace conversions

I was trying to figure out why my video went from OK playable to totally choppy and useless when I enabled the white balance plugin (which is based on frei0r). After tearing most of my hair out trying to figure out what's going on, here's a sort-of analysis:

First off, the balanc0r plugin itself is very slow; it does tons of useless float-to-int conversions, function calls etc., and should use SSE2 when available. However, it only uses about 5% of my total CPU time, so it's not the problem. I'll file a separate bug against frei0r with a patch for this at some later stage. Fixing it takes it down to 1%, but playback is still choppy.

Second, note that the amount of optimized conversions in libswscale from YUYV (packed 4:2:0 YUV) is very limited. If you hit a C fallback, it will be _slow_; on a simple playback of some H.264 files from my D4 (which are something like 20 Mbit/sec, so really expensive to decode), I measured that >25% of my CPU time was being used in libswscale, a lot of it in non-MMX-optimized routines such as yuv2rgba32_X_c. Worse, that time is single-threaded.

Third, note that Kdenlive seems to be using RGBA throughout, which also is very poorly supported in libswscale (it prefers the much more common BGRA format, since that's how things naturally end up on x86 when you keep a pixel in an register and store it little-endian). In fact, this is so ingrained that even the few filters in frei0r that request BGRA are handled as RGBA, with some ad-hoc R/B-swap code inserted front and back.

So, my aim was to try to keep the YUYV conversions out of the loop as much as possible (libavcodec decodes my H.264 to yuvj420p, i.e., planar 4:2:0, which is generally much friendlier). Seemingly the SDL output sets the desired format to YUYV, causing the avformat producer to auto-convert through swscale (even though mlt_image.c already has conversion functions, also through swscale -- I'm not sure if I understand why). That is simple to fix, though; I'm attaching a demonstration patch, although it might need some cleanup.

However, there is still something in the pipeline converting to YUYV; it turns out that on the signal path into the filter, a bunch of filters are inserted as “normalisers”; e.g., the signal is resized if need be, cropped, deinterlaced, etc.. The deinterlacer is special here: It needs to look at the incoming frame to see if it is interlaced or not (the "progressive" flag is 0), since that could change from frame to frame. However, it seemingly does this check _before_ the frame is available, and the default is 0. Thus, it prepares itself for deinterlacing, and it can seemingly only accept YUYV, so it asks for that format when asking the next stage for the frame. This is what causes the conversion in the avformat producer.

Now, after doing this, it actually re-checks the "progressive" flag, finds that it is now 1, and just tosses the frame on. However, the damage is already done; the frame has been converted from yuv420p to to YUYV (which already takes some time), and then needs to converted from YUYV to RGBA32 for frei0r, which is one of the super-slow conversions. After frei0r has done its thing, the frame is converted back from RGBA32 to yup420p for SDL (which is relatively fast, since it is accelerated), and then showed.

So my request would be (given that most frei0r filters cannot operate on yuv420p or interlaced signals, I'd suppose :-) ): Can the deinterlacer not change the format unconditionally to YUYV, but instead convert at-the-spot if the frame that comes in really is interlaced? That seems to be the core of the problem here, given that it is seemingly impossible to know before the frame is produced whether it will be progressive or not.

A tangent: In doing this, I also found out that the non-swscale path of convert_image() (in producer_avformat.c) has a bug where it compares "format" (which is a pointer) against mlt_image_rgb24a and mlt_image_opengl (which are integers). I guess those missing deferences should be added; they could also affect this issue.
Steps To Reproduce1. Add a H.264 clip to the timeline.
2. Add the white balance plugin to the clip.
3. Playback and note the huge spike in CPU usage, causing the audio to be choppy.
Additional InformationI'm on a quad-core Core i5 2.53 GHz, so it should be possible to edit 720p video with white balance adjustments without lots of choppiness. :-)
TagsNo tags attached.
Build/Install MethodManual build from Git
Attached Filesdiff file icon sdl-use-yuv12.diff [^] (2,433 bytes) 2012-09-09 02:06 [Show Content]

- Relationships

-  Notes
Sesse (developer)
2012-09-09 02:23

I also forgot to mention: convert_image() in the avformat producer sets SWS_ACCURATE_RND. This flag also needs to be removed to get the MMX2 version of the colorspace conversion.
ddennedy (developer)
2012-09-13 07:02

re: balanc0r and "should use SSE2 when available." Of course, all code should be bug-free, optimized, and readbable. Quite often code gets ported from other projects and then just sits there until someone like you becomes annoyed by it. :-)

re: "Kdenlive seems to be using RGBA throughout" is wrong. Most of the RGBA code is in frei0r. Kdenlive only needs RGB for thumbnails and video scopes, and MLT prefers YCbCr. Kdenlive devs chose to make these filters available to users instead of hiding them. And yeah, we already know these conversions are performance killers, but such is life with code written to specific formats, a flexible toolkit, and few developers.

re: "SDL output sets the desired format to YUYV." There is a long history where MLT adopted YUYV in the beginning for very specific requirements. So, most MLT filters, including the normalizers, are written for YUYV. Changing the SDL mlt_image_format alone is not enough. There is a rendering thread in the base consumer class that defaults to mlt_image_yuv422, but it can be changed using the mlt_image_format property. However, support for conversions between yuv420p and other mlt image formats was only really fleshed out last year along with the ability to set the rendering thread's format. Kdenlive is not always aggressive about adopting new things because users are already subjected to enough instability. I will try to test out the patch more.

re: deinterlacer and "it seemingly does this check _before_ the frame is available, and the default is 0." Not always. Some producers can signal this property on the frame object before the frame's image is obtained. But I do think this could be improved as you pointed out, and I will do that. However, you also need to realize that none of our filters or transitions operate on yuv420p, and there are so many situations where filters are invoked, especially with normalizing.

re: "the non-swscale path of convert_image() (in producer_avformat.c) has a bug" Ok, thanks, that part of the code is neglected and due for removal, but I will just fix that now real quick.

re: SWS_ACCURATE_RND. That was chosen for quality reasons. I think a while ago some conversions gave very poor quality without it. Or, when Kdenlive added waveform and histogram scopes, adding this reduced some of the fringing and banding those showed in the conversions. I forget exactly. I think perhaps now that flag should only be set when interpolation is requested. By default, SDL does not request interpolation (scaling and transforms) to make faster draft previews, and then the avformat consumer sets interpolation to bilinear by default for the final rendering.
Sesse (developer)
2012-09-13 10:25

When I say "Kdenlive seems to be using RGBA throughout", I mean that it uses RGBA over BGRA or ABGR. I don't mean to say it uses RGBA over YUV.

As for SWS_ACCURATE_RND, I haven't been able to find any quality loss with it when it comes to colorspace conversion. Scaling might be a different story. I haven't seen any code for anything but SWS_BICUBIC, though, but maybe I haven't searched enough.

And yes, I realize that there are situations where filters are invoked; I'm not sure what people's typical use cases are, but I'd guess that having a single frei0r filter is not an uncommon use case, so it's worth looking at that specific case (which is what I did). Similarly, having no filters at all, just yuv420p video and an output, would sound like a common use case, and that is also improved by using yuv420p for the SDL output. Of course, if there is a common case where avcodec outputs YUYV, I'm now introducing a format conversion for that—it would be interesting to see if it ever happens.
ddennedy (developer)
2012-09-14 06:13

I applied a minor variation of the SDL patch (not yet in git) where it also sets the mlt_image_format property to yuv420p for the render thread. Then, I made a rough draft of the deinterlace filter changes to eliminate the highly probable unnecessary conversion to yuv422. As a result, playing HDV with melt brought CPU usage down from 86% to 56%. For sake of comparison, ffplay plays same file at 53% on my system. That is very nice. I have been using yuv420p for the consumer in Shotcut (another project) for quite a while now, so I am fairly comfortable with making that change to the sdl consumer. I still need to review my changes to the deinterlace filter and test that some more.

re: SWS_ACCURATE_RND. You have the luxury of basing your finding on whatever version of ffmpeg or libav you are using now. MLT tries to support FFmpeg from version 0.6.0. It seems we were not alone in our finding: [^] [^]

I will have to do more testing on older versions with a change that removes SWS_ACCURATE_RND for conversions and only includes it for scaling when the interpolation level is set to 'bilinear' or higher (avformat consumer).
ddennedy (developer)
2012-09-14 06:56

For interlaced footage + project, there is another thing I needed to add to prevent an unnecessary conversion: the ability to say "I do not care what field order I get." This is done by setting top_field_first=-1 on the consumer and included by sdl. Only things like SDI, HDMI, and sometimes encoding do need specific field order.
ddennedy (developer)
2012-09-14 07:46

Fixed in MLT git commit cf73314203ff98894b2d91a99354aaff69a53ad3.
The changes do not address SWS_ACCURATE_RND. I will test that further outside this ticket.
ddennedy (developer)
2012-09-14 19:59

I had to revert the consumer_sdl patch. Some services are not prepared to handle requests for yuv420p, so all of them need a review, changing, and testing. Perhaps you did not notice because this is about the format the render thread is requesting, not the consumer. So, the essential change to revert was setting the mlt_image_format=yuv420p property. It does not make sense to convert yuv422 to yuv420p just for the sdl consumer, so all of that was reverted for now. What is not reverted are the changes to the deinterlace filter and the ability to suppress field order correction.

- Issue History
Date Modified Username Field Change
2012-09-09 02:06 Sesse New Issue
2012-09-09 02:06 Sesse File Added: sdl-use-yuv12.diff
2012-09-09 02:23 Sesse Note Added: 0008307
2012-09-13 07:02 ddennedy Note Added: 0008329
2012-09-13 10:25 Sesse Note Added: 0008334
2012-09-14 06:13 ddennedy Note Added: 0008335
2012-09-14 06:56 ddennedy Note Added: 0008336
2012-09-14 07:45 ddennedy Assigned To => ddennedy
2012-09-14 07:45 ddennedy Status new => assigned
2012-09-14 07:46 ddennedy Note Added: 0008337
2012-09-14 07:46 ddennedy Status assigned => resolved
2012-09-14 07:46 ddennedy Resolution open => fixed
2012-09-14 19:59 ddennedy Note Added: 0008342
2013-01-28 19:50 j-b-m Fixed in Version => 0.9.4
2013-01-28 19:50 j-b-m Status resolved => closed

Copyright © 2000 - 2016 MantisBT Team
Powered by Mantis Bugtracker