MJPEG inter/intraframe compression and decoding

2 years ago

#50896

amlwwalker

I am [using Go] to read the frames off a websocket that are being live streamed from a webcam somewhere in the world. Each frame is a jpeg image and the video as a whole is what I believe would then be an MJPEG.

After attempting to combine the images back together into a video, I noticed that most of the images are

a) smaller in dimensions than previous images b) are not complete images, but are actually images within images, each sub image is just a part of the bigger/whole image, and are the parts that have changed/image area that is not the same as the previous frame.

I believe from research this is inter/intraframe compression whereby intraframes are sent, and then interfames contain the diff, in effect compared to the last intrafame.

Along side the binary image data, there is also sent JSON meta data, such as these 2 examples

{"fid":1,"h":[224,224],"name":"1/resort","rfid":1,"w":[400,400],"wallclock":1641416368513434,"x":[0,0],"y":[0,0]}
{"fid":2,"h":[224,82,12,56,82,30],"name":"1/resort","rfid":1,"w":[398,84,12,98,146,16],"wallclock":1641416368680294,"x":[0,314,336,100,82,0],"y":[0,52,10,168,0,10]}

From this I have deducted:

Fid is the ID of this frame
Rfid I believe to be the intraframe that the diffs contained within Fid are to be superimposed on to

So now, my deduction leads me to the arrays, H, W, X, Y.

What I have come to realise is that for array index 1 the partial image size is 82,84 and should be placed at coordinates 314, 52. Same principle for index 2,3,4,5.

What I am not sure is index 0. As the interframe is images to be super imposed onto the previous intraframe, the confusion comes from the size of this image is smaller than the intraframe.

You can see for instance that for index 0, W is 398. Thats two pixels off the intraframe width. And why does the intraframe have two heights/widths? Surely its one image and therefore doesn't have any more detail/data than just one picture within it?

Anyone familiar with inter/intraframe compression could help me work out what the first values are for. My guess is its something to do with positioning the image within a bigger image so that the intraframe dimensions are always the biggest/dominant ones, but I can't figure out what these numbers at array index 0 would be for.

In short, how do I translate the coordinates and values that are provided to recreate the correct frames for the video?

video-streaming

video-processing

mjpeg

video-compression

0 Answers

Your Answer

Posts

Questions

Blogs