I think I'm starting to understand. It looks like MVD_Services needs both PTS and DTS. It's impossible to play the video properly without both of them.
First, I found this: https://web.archive.org/web/2018070...k:80/janos/2008/06/08/b-frames-in-directshow/
If the stream does not have B-frames (I and P only), you only need one set of timestamps, because all P-frames reference the I-frame or P-frame that came before them, and frames are stored in playback order.
I <-- P <-- P (decode order = playback order)
DTS and PTS are required for a stream that contains B-frames, because you need two pieces of information. DTS is used for decoding, because the frames must be stored out of playback order. Since B-frames refer to multiple references, the references must be decoded before you can decode the B-frames. This means a P-frame that is displayed after a B-frame during playback must be decoded before the B-frame, or else you can't decode the B-frame because one of its references isn't available.
Once you decode the frames, you need PTS to put them back into playback order so you can display them. Without PTS, you'll play the frames in the wrong order.
I <-- P <-- B (decode order)
I <-- B --> <-- P (playback order)
According to this: https://stackoverflow.com/questions/13595288/understanding-pts-and-dts-in-video-frames
raw DTS values are arbitrary, because they're used as offsets. The first DTS is converted to "wall clock" time (in this case, 3DS system clock time?), and all subsequent DTS values are an offset from the "wall clock" time.
This page: https://www.ramugedia.com/how-generate-dts-pts-from-elementary-stream
shows the difference between decoding I and P only vs I, P, and B; however, it doesn't cover the case when B-frames are used as references (--b-pyramid x264 setting).
This would indicate that you need to give frames to MVD_Services with DTS and PTS. The out of order frame bug is probably caused by giving DTS only, and giving PTS only will cause the frames to be corrupt, since you can't decode them in PTS order.
First, I found this: https://web.archive.org/web/2018070...k:80/janos/2008/06/08/b-frames-in-directshow/
If the stream does not have B-frames (I and P only), you only need one set of timestamps, because all P-frames reference the I-frame or P-frame that came before them, and frames are stored in playback order.
I <-- P <-- P (decode order = playback order)
DTS and PTS are required for a stream that contains B-frames, because you need two pieces of information. DTS is used for decoding, because the frames must be stored out of playback order. Since B-frames refer to multiple references, the references must be decoded before you can decode the B-frames. This means a P-frame that is displayed after a B-frame during playback must be decoded before the B-frame, or else you can't decode the B-frame because one of its references isn't available.
Once you decode the frames, you need PTS to put them back into playback order so you can display them. Without PTS, you'll play the frames in the wrong order.
I <-- P <-- B (decode order)
I <-- B --> <-- P (playback order)
According to this: https://stackoverflow.com/questions/13595288/understanding-pts-and-dts-in-video-frames
raw DTS values are arbitrary, because they're used as offsets. The first DTS is converted to "wall clock" time (in this case, 3DS system clock time?), and all subsequent DTS values are an offset from the "wall clock" time.
This page: https://www.ramugedia.com/how-generate-dts-pts-from-elementary-stream
shows the difference between decoding I and P only vs I, P, and B; however, it doesn't cover the case when B-frames are used as references (--b-pyramid x264 setting).
This would indicate that you need to give frames to MVD_Services with DTS and PTS. The out of order frame bug is probably caused by giving DTS only, and giving PTS only will cause the frames to be corrupt, since you can't decode them in PTS order.