This is a place to centralize the ideas on a media test suite for GStreamer
Media test suite results online
What a media test suite should do
- Take in a bunch of (non-)media files
- Check if Gstreamer can handle them
If it knows the MimeType
- If it can handle the container formats
- If it can find all the media types contained in those container formats
- If it can handle the different media types (do we have the proper decoder)
- If it can decode everything to raw media types (audio/video/text?)
- And of course, never crash/segfault for any of the above (even if you're trying to read an Excel spreadsheet).
- Make a clean and detailled reporting
- information on the files/streams
- debug logs
- backtraces
Do complex actions on the medias/pipelines
- Play them (maybe not all the way)
- Seek in them in various ways (using the GST_SEEK_FLAG_*)
- normal
- accurate
- keyframe
- segment seek ?
- Play them using a not-seekable source? like stdin
Handle problems
- segfaults (spawn separate process)
- with backtraces and debug logs
- Figure out if something went wrong
- compare wanted behaviour (seek to a certain point) with output (probe on sinks?)
- Make sure problems are really reported with GST_ERROR or GST_WARNING
- Log everything or just up to a certain level (WARNING ?)
- Maybe we could log everything with GST_DEBUG=*:2
- If there was an error, re-run it with GST_DEBUG=*:5
- Maybe we could log everything with GST_DEBUG=*:2
- find if the process hangs ? (it would be cool to attach a gdb to the pid and get a backtrace)
- How do we figure that out ? Maybe a default timeout and see if the pipeline has 'advanced' (or the debug log has grown).
- memory leak ? (run tests with valgrind)
- The problem with this is the time it would take...
Find some statistics
- How long take a seek on a file ?
- How accurate is a seek ?
- This can be done with a probe
- cpu/ram usage (only if the tests run on a dedicated machine..)
Find some strange behaviours
- Invalid timestamps on buffers when they are processed by the sinks
- This is why we need plugins (including sinks) to properly use the GST_ERROR and GST_WARNING macros for those cases
We could also find stream discontinuities this way (does previous-buffer->(timestamp+duration) == current-buffer->timestamp ?)
- Of course there are cases where there are discontinuities in purpose (but they should have GST_BUFFER_FLAG_DISCONT set)
How should we do it
gst-python contains an example application gstfile.py that tries to find multimedia information on a given file (a-la file), this could be a good starting point
- Run the tests on all files in several pass. If a file doesn't pass one step, don't use them for the next step.
- Try to run the basic gstfile.py Discoverer
- Try to play the files for a bit longer than preroll
- Try seeking
- Insert your wacky stress-test here
- It would be nice to have an xml or html reporting
- Users should be able to run it simply on problematic file(s) and attach the output to bug reports
- The best IMHO would be to divide it in two:
- A manager
- Handles logging
- Handles the files to process (which directories contains the files, which tests to run on which directories, which plugins to use/ignore)
- Test base class
- Can be subclassed to contain the specific scenarios
- Enables to have more or less complex testing
- A manager
Central GStreamer media testsuite
- We could host a test suite on a Fluendo machine, for all gstreamer developers.
- Have it run every x hours (2-4 times a day would be enough)
Use the current testing files (http://gstreamer.freedesktop.org/media/)
Also add the files from the mplayer repository (http://samples.mplayerhq.hu)
- Also add the files from the xine testsuite
- It would be also nice to spot out differences between the differents runs
- To see if a modifications brings in a problem for a given file (it's always a pain to try a plugin with all the files that it could support)
- maybe figure out from the cvs log who might be responsible for a given regression
... and the positiv aspect is "yeah, we can now handle completelyborked.avi !"
- Have different 'slaves' for testing with/without some plugins (like the fluendo plugins)
Wishlist
- shade the background of every second test-result line (e.g. #f0f0f0) to enhance readabillity
- show total number of files on initial page
- can we track the gst-elements that are involved in failed tests?
- if a file fails/crashes, but other files of the same type succeed, add links/notes pointing to the other file. This is good to be able to track down the actual difference.