Being a Sporadic Overview Of Linux Distribution Release Validation Processes
Yup, that's what this is. It's kind of in-progress, I'll probably add to it later, haven't looked into what Arch or Debian or a few other likely suspects do.
Fedora
Manual testing
Our glorious Fedora uses Mediawiki to manage both test cases and test results for manual release validation. This is clearly ludicrous, but works much better than it has any right to.
'Dress rehearsal' composes of the entire release media set are built and denoted as Test Composes or Release Candidates, which can be treated interchangably as 'composes' for our purposes here. Each compose represents a test event. In the 'TCMS' a test event is represented as a set of wiki pages; each wiki page can be referred to as a test type. Each wiki page must contain at least one wiki table with the rows representing a concept I refer to as a unique test or a test instance. There may be multiple tables on a page; usually they will be in separate wiki page sections.
The unique, identifying attributes of a unique test are:
- The wiki page and page section it is in
- The test case
- The user-visible text of the link to the test case, which I refer to as the 'test name'
unique tests may share up to two of those attributes - two tests may use the same test case and have the same test name but be in different page sections or pages, or they may be in the same page section and use the same test case but have a different test name, for instance.
The other attributes and properties of a unique test are:
- A milestone - Alpha, Beta or Final - indicating the test must be run for that release and later releases
- The environments for the test, which are the column titles appearing after the test case / test name in the table in which it appears; the environments for a given test can be reduced from the set for the table in which it appears by greying out table cells, but not extended beyond the columns that appear in the table
- The results that appear in the environment cells
Basically, Fedora uses mediawiki concepts - sections and tables - to structure storage of test results.
The Summary page displays an overview of results for a given compose, by transcluding the individual result pages for that compose.
Results themselves are represented by a template, with the general format {{result|status|username|bugs}}
.
Fedora also stores test cases in Mediawiki, for which it works rather well. The category system provides a fairly good capability to organize test cases, and templating allows various useful capabilities: it's trivial to keep boilerplate text that appears in many test cases unified and updated by using templates, and they can also be used for things like {{FedoraVersion}} to keep text and links that refer to version numbers up to date.
Obvious limitations of the system include:
The result entry is awkward, involving entering a somewhat opaque syntax (the result template) into another complex syntax (a mediawiki table). The opportunity for user error here is high.
Result storage and representation are strongly combined: the display format is the storage format, more or less. Alternative views of the data require complex parsing of the wiki text.
The nature of mediawiki is such that there is little enforcement of the data structures; it's easy for someone to invent a complex table or enter data 'wrongly' such that any attempt to parse the data may break or require complex logic to cope.
A mediawiki instance is certainly not a very efficient form of data storage.
Programmatic access
My own wikitcms/relval provides a python library (in Python) for accessing this 'TCMS'. It treats the conventions/assumptions about how pages are named and laid out, the format of the result template etc as an 'API' (and uses the Mediawiki API to actually interact with the wiki instance itself, via mwclient). This allows relval
to handle the creation of result pages (which sort of 'enforces' the API, as it obviously obeys its own rules/assumptions about page naming and so forth) and also to provide a TUI for reporting results. As with the overall system itself this is prima facie ridiculous, but actually seems to work fairly well.
relval
can produce a longitudinal view of results for a given set of composes with its testcase-stats
sub-command. I provide this view here for most Fedora releases, with the results for the current pre-release updated hourly or daily. This view provides information for each test type
on when each of its unique tests
was last run, and a detailed page for each unique test
detailing its results throughout the current compose.
Automated testing
Fedora does not currently perform any significant automated release validation testing. Taskotron currently only runs a couple of tests that catch packaging errors.
Examples
- Fedora result page
- Fedora test case: the page source demonstrates the use of templates for boilerplate text
Ubuntu
Manual testing
The puppy killers over at Ubuntu use a system called QATracker for manual testing. Here is the front end for manual release validation.
QATracker stores test cases and products (like Kubuntu Desktop amd64, Ubuntu Core i386). These are kind of 'static' data. Test events are grouped as builds of products for milestones, which form part of series. A series is something like an Ubuntu release - say, Utopic. A milestone roughly corresponds to a Fedora milestone - say, Utopic Final - though there are also nightly milestones which seem to fuzz the concept a bit. Within each milestone is a bunch of builds, of any number of products. There may be (and often is) more than one build for any given product within a single milestone.
So, for instance, in the Utopic Final milestone we can click See removed and superseded builds too and see that there were many builds of each product for that milestone.
Products and test cases are defined for each series. That is, for the whole Utopic series, the set of products and the set of test cases for each product is a property of the series, and cannot be varied between milestones or between builds. Every build of a given product within a given series will have the same test cases.
Test cases don't seem to have any capability to be instantiated (as in moztrap) - it's more like Fedora, a single test case is a single test case. I have not seen any capacity for 'templating', but may just have missed it.
Results are stored per build (as we've seen, a build is a member of a milestone, which is a member of a series). There is no concept of environments (which is why Ubuntu encodes the environments into the products) - all the results for a single test case within a single build are pooled together.
The web UI provides a fairly nice interface for result reporting, much nicer than Fedora's 'edit some wikitext and hope you got it right'. Results have a status of pass or fail - there does not appear to be any warn analog. Bug reports can be associated with results, as in Fedora, as can free text notes, and hardware information if desired.
QATracker provides some basic reporting capabilities, but doesn't have much in the way of flexible data representation - it presumably stores the data fairly sensibly and separately from its representation, but doesn't really provide different ways to view the data beyond the default web UI and the limited reporting capabilities.
The web UI works by drilling down through the layers. The front page shows a list of the most recent series with the milestones for each series within them, you can click directly into a milestone. The milestone page lists only active builds by default (but can be made to show superseded ones, as seen above). You can click into a build, and from the build page you see a table-ish representation of the test cases for that build, with the results (including bug links) listed alongside the test cases. You have to click on a test case to report a result for it. The current results for that test case are shown by default; the test case text is hidden behind an expander.
Limitations of the system seem to include:
There's no alternative/subsidiary/superior grouping of tests besides grouping by product, and no concept of environments. This seems to have resulted in the creation of a lot of products - each real Ubuntu product has multiple QATracker products, one per arch, for instance. It also seems to lead to duplication of test cases to cover things like UEFI vs. BIOS, which in Fedora's system or Moztrap can simply be environments.
Test case representation seems inferior to Mediawiki - as noted, template functionality seems to be lacking.
There seems to be a lack of options in terms of data representation - particularly the system is lacking in overviews, forcing you to drill all the way down to a specific build to see its results. There appears to be no 'overview' of results for a group of associated builds, or longitudinal view across a series of builds for a given product.
Examples
Programmatic access
QATracker provides an XML-RPC API for which python-qatracker is a Python library. It provides access to milestone series, milestones, products, builds, results and various properties of each. I was able to re-implement relval's testcase-stats for QATracker in a few hours.
Automated testing
Ubuntu has what appears to be a Jenkins instance for automated testing. This runs an apparently fairly small set of release validation tests.
OpenSUSE
Manual testing
Well...they've got a spreadsheet.
Automated testing
This is where OpenSUSE really shines - clearly most of their work goes into the OpenQA system.
The main front end to OpenQA provides a straightforward, fairly dense flat view of its results. It seems that test suites can be run against builds of distributions on machines (more or less), and the standard view can filter based on any of these.
The test suites cover a fairly extensive range of installation scenarios and basic functionality checks, comparable to the extent of Fedora's and Ubuntu's manual validation processes (though perhaps not quite so comprehensive).
An obvious potential drawback of automated QA is that the tests may go 'stale' as the software changes its expected behaviour, but at a superficial evaluation SUSE folks seem to be staying on top of this - there are no obvious absurd 'failure' results from cases where a test has gone stale for years, and the test suites seem to be actively maintained and added to regularly.
The process by which OpenQA 'failures' are turned into bug reports with sufficient useful detail for developers to fix seems to be difficult to trace at least from a quick scan of the documentation on the SUSE wiki.
Comments