Wandboard

Old Browser

You are using an outdated browser. Please update your browser.

news

Video playback using the Wandboard VPU, part 1

This is the first part of three in a series of articles about hardware accelerated video playback.
The very first part is about a common VPU driver issue on iMX6. The second and third part are about gstreamer, in general and on iMX6.

Physical memory allocation failure

Some of you will shiver seeing those words. It is a kernel error message haunting Linux video playback on the iMX6.

This blogpost discusses the reasons for this error, and proposes a workaround.

Why it happens

Primarily, the physical memory allocation failure is caused by a failure to allocate DMA memory inside the VPU driver. Contrary what some might believe, it is not because insufficient memory available or memory leals. It is due something far more sinister: DMA memory pool fragmentation.

Using the VPU for video decoding, the VPU driver will request and free blocks of varying sizes (range ~80k – ~5M), and eventually the fragmentation of the free memory will be so bad that a large continous block can no longer be allocated. Bam!

Fragmentation is a common problem when dealing with memory allocators. Many methods have been tried and tested, but thre is no silver bullet.

The current strategy employed by the kernel is a straightforward “first fit” allocator (deep inside dma_alloc_coherent()). That is, when a say 300kB block is requested, the first free block of sufficient size is used. Be it a 300kB block or a 5MB block. This way, eventually all continous, say 5MB, blocks are partially in use by far smaller blocks.

Workaround idea

While a smarter algorithm for allocating dma memory probably could improve on the situation, it would be a non-trivial task to test and implement other allocators.

It happens that the likelyhood of badly fragmented memory is only one consideration when implementing a memory allocator. Other considerations, like speed, has also to be taken into account. Let’s just contend that memory allocators are a rich area of research, and focus on the problem at hand.

Instead of an ultimate memory allocator, the following workaround is proposed: how about not really freeing and allocating memory, and just caching the continous blocks when they were supposed to be freed? This is the age-old technique of using a memory pool allocator.

Results

We tested this inside the VPU driver, with promising results. Our test playback demo played four simultaneous video clips, and the DMA allocation error usually happened after 10-20 minutes.

After implementing the pool allocator, we were able to play the clips for days without any errors.

Patch

The patch can be found in the Wandboard git.

The drawback of the patch is that the VPU driver keeps a pool of memory blocks that are never freed. It binds DMA memory that can never be used by other drivers.

Future work

Also DMA memory is also requested by the PXP driver. In case of persistent issues, implementing a memory pool allocator might be beneficial in the PXP driver as well. Maybe even with combined memory pools with the VPU driver?

Talk back and discuss in the Wandboard community forums