Automated critical path update functional testing for Fedora

A little thing I've been working on lately finally went live today...this thing:

openQA test results in Bodhi

Several weeks ago now, I adapted Fedora's openQA to run an appropriate subset of tests on critical path updates. We originally set up our openQA deployment strictly to run tests at the distribution compose level, but I noticed that most of the post-install tests would actually also be quite useful things to test for critical path updates, too.

First, I set up a slightly different openQA workflow that starts from an existing disk image of a clean installed system, downloads the packages from a given update, sets up a local repository containing the packages, and runs dnf -y update before going ahead with the main part of the test.

Then, I adapted our openQA scheduler to trigger this workflow whenever a critical path update is submitted or edited, and forward the results to ResultsDB.

All of this went into production a few weeks ago, and the tests have been run on every critical path update since then. But there was a big piece missing: making the information easily available to the update submitter (and anyone else interested). So I wanted to make the results visible in Bodhi, alongside the Taskotron results. So I sent a patch for Bodhi, and the new Bodhi release with that change included was deployed to production today.

The last two Bodhi releases actually make some other great improvements to the display of automated test results, thanks to Ryan Lerch and Randy Barlow. The results are actually retrieved from ResultsDB by client-side Javascript every time someone views an update. Previously, this was done quite inefficiently and the results were shown at the top of the main update page, which meant they would show up piecemeal for several seconds after the page had mostly loaded, which was rather annoying especially for large updates.

Now the results are retrieved in a much more efficient manner and shown on a separate tab, where a count of the results is displayed once they've all been retrieved.

So with Bodhi 2.6, you should have a much more pleasant experience viewing automated test results in Bodhi's web UI - and for critical path updates, you'll now see results from openQA functional testing as well as Taskotron tests!

At present, the tests openQA runs fall into three main groups:

  1. Some simple 'base' tests, which check that SELinux is enabled, service manipulation (enabling, disabling, starting and stopping services) works, no default-enabled services fail to start, and updating the system with dnf works.

  2. Some desktop tests (currently run only on GNOME): launching and using a graphical terminal works, launching Firefox and doing some basic tests in it works, and updating the system with the graphical updater (GNOME Software in GNOME's case) works.

  3. Some server tests: is the firewall configured and working as expected, is Cockpit enabled by default, and does it basically work, and both server and client tests for the database server (PostgreSQL) and domain controller (FreeIPA) server roles.

So if any of these fail for a critical path update, you should be able to see it. You can click any of the results to see the openQA webUI view of the test.

At present you cannot request a re-run of a single test. We're thinking about mechanisms for allowing this at present. You can cause the entire set of openQA tests to be run again by editing the update: you don't have to add or remove any builds, any kind of edit (just change a character in the description) will do.

If you need help interpreting any openQA test results, please ask on the test@ mailing list or drop by #fedora-qa . Myself or garretraziel should be available there most of the time.

Please do send along any thoughts, questions, suggestions or complaints to test@ or as a comment on this blog post. We'll certainly be looking to extend and improve this system in future!

Fedora Media Writer Test Day (Fedora 26 edition) on Thursday 2017-04-20!

It's that time again: we have another test day coming up this week! Thursday 2017-04-20 will be another Fedora Media Writer Test Day. We've run these during the Fedora 24 and 25 cycles as well, but we want to make sure the tool is ready for the Fedora 26 release, and also test a major new feature it has this time around: support for writing ARM images. So please, if you have a bit of spare time and a system to test booting on - especially a Fedora-supported ARM device - come out and help us test!

The Test Day page contains all the instructions you need to run the tests and send along your results. As always, the event is in #fedora-test-day on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

Fedora 26 Alpha released, and blivet-gui Anaconda Test Day on Thursday (2016-04-06)

Hi again folks! Two bits of Fedora 26 news today. First off, Fedora 26 Alpha has been released! It got delayed by a couple of weeks due to rather a grab-bag of issues - mainly problems with FreeIPA and several kernel bugs - but the delays did at least mean we wound up with a really pretty solid build, according to our testing so far. Please do grab the Alpha, play around with it, and see how it works for you. Remember to read the Common Bugs page, though I'm still working on it at the moment.

Secondly, we have another Test Day coming up this Thursday, 2017-04-06! Anaconda blivet-gui Test Day will be a pretty big one. In Fedora 26, an additional partitioning interface is added to Anaconda (the Fedora installer). As well as anaconda's own custom partitioning interface, there is now a choice to run the blivet-gui partitioning tool from anaconda. This tool is built on the same backend as anaconda itself, but provides an alternative user interface. It's been available as a standalone tool since Fedora 21, but Fedora 26 is the first time it can be run from the installer to do install-time partitioning. The Test Day will be all about testing out this new feature and making sure it integrates properly with anaconda and works properly in various situations. Please do come along and help out if you have time!

The Test Day page contains all the instructions you need to run the tests and send along your results. As always, the event is in #fedora-test-day on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

Fedora 26 crypto policy Test Day today (2017-03-30)!

Sorry for the short notice, folks! Today is Fedora 26 crypto policy Test Day. This event is intended to test the Fedora 26 changes and updates to the ongoing crypto policy feature which intends to provide a centralized and unified configuration system for the various cryptography libraries commonly used in Fedora.

The Test Day page contains all the instructions you need to run the tests and send along your results. As always, the event is in #fedora-test-day on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

Getting started with Pagure CI

I spent a few hours today setting up a couple of the projects I look after, fedfind and resultsdb_conventions, to use Pagure CI. It was surprisingly easy! Many thanks to Pingou and Lubomir for working on this, and of course Kevin for helping me out with the Jenkins side.

You really do just have to request a Jenkins project and then follow the instructions. I followed the step-by-step, submitted a pull request, and everything worked first time. So the interesting part for me was figuring out exactly what to run in the Jenkins job.

The instructions get you to the point where you're in a checkout of the git repository with the pull request applied, and then you get to do...whatever you can given what you're allowed to do in the Jenkins builder environment. That doesn't include installing packages or running mock. So I figured what I'd do for my projects - which are both Python - is set up a good tox profile. With all the stuff discussed below, the actual test command in the Jenkins job - after the boilerplate from the guide that checks out and merges the pull request - is simply tox.

First things first, the infra Jenkins builders didn't have tox installed, so Kevin kindly fixed that for me. I also convinced him to install all the variant Python version packages - python26, and the non-native Python 3 packages - on each of the Fedora builders, so I can be confident I get pretty much the same tox run no matter which of the builders the job winds up on.

Of course, one thing worth noting at this point is that tox installs all dependencies from PyPI: if something your code depends on isn't in there (or installed on the Jenkins builders), you'll be stuck. So another thing I got to do was start publishing fedfind on PyPI! That was pretty easy, though I did wind up cribbing a neat trick from this PyPI issue so I can keep my README in Markdown format but have setup.py convert it to rst when using it as the long_description for PyPI, so it shows up properly formatted, as long as pypandoc is installed (but work even if it isn't, so you don't need pandoc just to install the project).

After playing with it for a bit, I figured out that what I really wanted was to have two workflows. One is to run just the core test suite, without any unnecessary dependencies, with python setup.py test - this is important when building RPM packages, to make sure the tests pass in the exact environment the package is built in (and for). And then I wanted to be able to run the tests across multiple environments, with coverage and linting, in the CI workflow. There's no point running code coverage or a linter while building RPMs, but you certainly want to do it for code changes.

So I put the install, test and CI requirements into three separate text files in each repo - install.requires, tests.requires and tox.requires - and adjusted the setup.py files to do this in their setup():

install_requires = open('install.requires').read().splitlines(),
tests_require = open('tests.requires').read().splitlines(),

In tox.ini I started with this:

deps=-r{toxinidir}/install.requires
     -r{toxinidir}/tests.requires
     -r{toxinidir}/tox.requires

so the tox runs get the extra dependencies. I usually write pytest tests, so to start with in tox.ini I just had this command:

commands=py.test

Pytest integration for setuptools can be done in various ways, but I use this one. Add a class to setup.py:

import sys
from setuptools import setup, find_packages
from setuptools.command.test import test as TestCommand

class PyTest(TestCommand):
    user_options = [('pytest-args=', 'a', "Arguments to pass to py.test")]

    def initialize_options(self):
        TestCommand.initialize_options(self)
        self.pytest_args = ''
        self.test_suite = 'tests'

    def run_tests(self):
        #import here, cause outside the eggs aren't loaded
        import pytest
        errno = pytest.main(self.pytest_args.split())
        sys.exit(errno)

and then this line in setup():

cmdclass = {'test': PyTest},

And that's about the basic shape of it. With an envlist, we get the core tests running both through tox and setup.py. But we can do better! Let's add some extra deps to tox.requires:

coverage
diff-cover
pylint
pytest-cov

and tweak the commands in tox.ini:

commands=py.test --cov-report term-missing --cov-report xml --cov fedfind
         diff-cover coverage.xml --fail-under=90
         diff-quality --violations=pylint --fail-under=90

By adding a few args to our py.test call we get a coverage report for our library with the pull request applied. The subsequent commands use the neat diff_cover tool to add some more information. diff-cover basically takes the full coverage report (coverage.xml is produced by --cov-report xml) and considers only the lines that are touched by the pull request; the --fail-under arg tells it to fail if there is less than 90% coverage of the modified lines. diff-quality runs a linter (in this case, pylint) on the code and, again, considers only the lines changed by the pull request. As you might expect, --fail-under=90 tells it to fail if the 'quality' of the changed code is below 90% (it normalizes all the linter scores to a percentage scale, so that really means a pylint score of less than 9.0).

So without messing around with shipping all our stuff off to hosted services, we get a pretty decent indicator of the test coverage and code quality of the pull request, and it shows up as failing tests if they're not good enough.

It's kind of overkill to run the coverage and linter on all the tested Python environments, but it is useful to do it at least on both Python 2 and 3, since the pylint results may differ, and the code might hit different paths. Running them on every minor version isn't really necessary, but it doesn't take that long so I'm not going to sweat it too much.

But that does bring me to the last refinement I made, because you can vary what tox does in different environments. One thing I wanted for fedfind was to run the tests not just on Python 2.6, but with the ancient versions of several dependencies that are found in RHEL / EPEL 6. And there's also an interesting bug in pylint which makes it crash when running on fedfind under Python 3.6. So my tox.ini really looks this:

[tox]
envlist = py26,py27,py34,py35,py36,py37
skip_missing_interpreters=true
[testenv]
deps=py27,py34,py35,py36,py37: -r{toxinidir}/install.requires
     py26: -r{toxinidir}/install.requires.py26
     py27,py34,py35,py36,py37: -r{toxinidir}/tests.requires
     py26: -r{toxinidir}/tests.requires.py26
     py27,py34,py35,py36,py37: -r{toxinidir}/tox.requires
     py26: -r{toxinidir}/tox.requires.py26
commands=py27,py34,py35,py36,py37: py.test --cov-report term-missing --cov-report xml --cov fedfind
         py26: py.test
         py27,py34,py35,py36,py37: diff-cover coverage.xml --fail-under=90
         # pylint breaks on functools imports in python 3.6+
         # https://github.com/PyCQA/astroid/issues/362
         py27,py34,py35: diff-quality --violations=pylint --fail-under=90
setenv =
    PYTHONPATH = {toxinidir}

As you can probably guess, what's going on there is we're installing different dependencies and running different commands in different tox 'environments'. pip doesn't really have a proper dependency solver, which - among other things - unfortunately means tox barfs if you try and do something like listing the same dependency twice, the first time without any version restriction, the second time with a version restriction. So I had to do a bit more duplication than I really wanted, but never mind. What the files wind up doing is telling tox to install specific, old versions of some dependencies for the py26 environment:

[install.requires.py26]
cached-property
productmd
setuptools == 0.6.rc10
six == 1.7.3

[tests.requires.py26]
pytest==2.3.5
mock==1.0.1

tox.requires.py26 is just shorter, skipping the coverage and pylint bits, because it turns out to be a pain trying to provide old enough versions of various other things to run those checks with the older pytest, and there's no real need to run the coverage and linter on py26 as long as they run on py27 (see above). As you can see in the commands section, we just run plain py.test and skip the other two commands on py26; on py36 and py37 we skip the diff-quality run because of the pylint bug.

So now on every pull request, we check the code (and tests - it's usually the tests that break, because I use some pytest feature that didn't exist in 2.3.5...) still work with the ancient RHEL 6 Python, pytest, mock, setuptools and six, check it on various other Python interpreter versions, and enforce some requirements for test coverage and code quality. And the package builds can still just do python setup.py test and not require coverage or pylint. Who needs github and coveralls? ;)

Of course, after doing all this I needed a pull request to check it on. For resultsdb_conventions I just made a dumb fake one, but for fedfind, because I'm an idiot, I decided to write that better compose ID parser I've been meaning to do for the last week. So that took another hour and a half. And then I had to clean up the test suite...sigh.

Announcing the resultsdb-users mailing list

I've been floating an idea around recently to people who are currently using ResultsDB in some sense - either sending reports to it, or consuming reports from it - or plan to do so. The idea was to have a group where we can discuss (and hopefully co-ordinate) use of ResultsDB - a place to talk about result metadata conventions and so forth.

It seemed to get a bit of traction, so I've created a new mailing list: resultsdb-users. If you're interested, please do subscribe, through the web interface, or by sending a mail with 'subscribe' in the subject to this address.

If you're not familiar with ResultsDB - well, it's a generic storage engine for test results. It's more or less a database with a REST API and some very minimal rules for what constitutes a 'test result'. The only requirements really are some kind of test name plus a result, chosen from four options; results can include any other arbitrary key:value pairs you like, and a few have special meaning in the web UI, but that's about it. This is one of the reasons for the new list: because ResultsDB is so generic, if we want to make it easily and reliably possible to find related groups of results in any given ResultsDB, we need to come up with ways to ensure related results share common metadata values, and that's one of the things I expect we'll be talking about on the list.

It began life as Taskotron's result storage engine, but it's pretty independent, and you could certainly get value out of a ResultsDB instance without any of the other bits of Taskotron.

Right now ResultsDB is used in production in Fedora for storing results from Taskotron, openQA and Autocloud, and an instance is also used inside Red Hat for storing results from some RH test systems.

Please note: despite the list being a fedoraproject one, the intent is to co-ordinate with folks from CentOS, Red Hat and maybe even further afield as well; we're just using an fp.o list as it's a quick convenient way to get a nice mailman3/hyperkitty list without having to go set up a list server on taskotron.org or something.

The future of Fedora QA

Welcome to version 2.0 of this blog post! This space was previously occupied by a whole bunch of longwinded explanation about some changes that are going on in Fedoraland, and are going to be accelerating (I think) in the near future. But it was way too long. So here's the executive summary!

First of all: if you do nothing else to get up to speed on Stuff That's Going On, watch Ralph Bean's Factory 2.0 talk and Adam Samalik's Modularity talk from Devconf 2017. Stephen Gallagher's Fedora Server talk and Dennis Gilmore's 'moving everyone to Rawhide' talk are also valuable, but please at least watch Ralph's. It's a one-hour overview of all the big stuff that people really want to build for Fedora (and RH) soon.

To put it simply: Fedora (and RH) don't want to be only in the business of releasing a bunch of RPMs and operating system images every X months (or years) any more. And we're increasing moving away from the traditional segmented development process where developers/package maintainers make the bits, then release engineering bundles them all up into 'things', and then QA looks at the 'things' and says "er, it doesn't boot, try again", and we do that for several months until QA is happy, then we release it and start over. There is a big project to completely overhaul the way we build and ship products, using a pipeline that involves true CI, where each proposed change to Fedora produces an immediate feedback loop of testing and the change is blocked if the testing fails. Again, watch Ralph's talk, because what he basically does is put up a big schematic of this entire system and go into a whole bunch of detail about his vision for how it's all going to work.

As part of this, some of the folks in RH's Fedora QA team whose job has been to work on 'automated testing' - a concept that is very tied to the traditional model for building and shipping a 'distribution', and just means taking some of the tasks assigned to QA/QE in that model and automating them - are now instead going to be part of a new team at Red Hat whose job is to work on the infrastructure that supports this CI pipeline. That doesn't mean they're leaving Fedora, or we're going to throw away all the work we've invested in the components of Taskotron and start all over again, but it does mean that some or all of the components of Taskotron are going to be re-envisaged as part of a modernized pipeline for building and shipping whatever it is we want to call Fedora in the future - and also, if things go according to plan, for building and shipping CentOS and Red Hat products, as part of the vision is that as many components of the pipeline as possible will be shared among many projects.

So that's one thing that's happening to Fedora QA: the RH team is going to get a bit smaller, but it's for good and sensible reasons. You're also not going to see those folks disappear into some kind of internal RH wormhole, they'll still be right here working on Fedora, just in a somewhat different context.

Of course, all of this change has other implications for Fedora QA as well, and I reckon this is a good time for those of us still wearing 'Fedora QA' hats - whether we're paid by Red Hat or not - to be reconsidering exactly what our goals and priorities ought to be. Much like with Taskotron, we really haven't sat down and done that for several years. I've been thinking about it myself for a while, and I wouldn't say I have it all figured out, but I do have some thoughts.

For a start I think we should be looking ahead to the time when we're no longer on what the anaconda team used to call 'the blocker treadmill', where a large portion of our working time is eaten up by a more or less constant cycle of waking up, finding out what broke in Rawhide or Branched today, and trying to get it fixed. If the plans above come about, that should happen a lot less for a couple of reasons: firstly Fedora won't just be a project which releases a bunch of OS images every six months any more, and secondly, distribution-level CI ought to mean that things aren't broken all the damn time any more. In an ideal scenario, a lot of the basic fundamental breakage that, right now, is still mostly caught by QA - and that we spend a lot of our cycles on dealing with - will just no longer be our problem. In a proper CI system, it becomes truly the developers' responsibility: developers don't get to throw in a change that breaks everything and then wait for QA to notice and tell them about it. If they try and send a change that breaks everything, it gets rejected, and hopefully, the breakage never really 'happens'.

Sadly (or happily, given I still have a mortgage to pay off) this probably doesn't mean Project Colada will finally be reality and we all get to sit on the beach drinking cocktails for the rest of our lives. CI is a great process for ensuring your project basically works all the time, but 'basically works' is a long way from 'perfect'. Software is still software, after all, and a CI process is never going to catch all of the bugs. Freeing QA from the blocker treadmill lets us look up and think, well, what else can we do?

To be clear, I think we're still going to need 'release validation'. In fact, if the bits of the plan about having more release streams than just 'all the bits, every six months' come off, we'll need more release validation. But hopefully there'll be a lot more "well, this doesn't quite work right in this quite involved real-world scenario" and less "it doesn't boot and I think it ate my cat" involved. For the near future, we're going to have to keep up the treadmill: bar a few proofs of concept and stuff, Fedora 26 is still an 'all the bits, every six months' release, and there's still an awful lot of "it doesn't boot" involved. (Right now, Rawhide doesn't even compose, let alone boot!) But it's not too early to start thinking about how we might want to revise the 'release validation' concept for a world where the wheels don't fall off the bus every five minutes. It might be a good idea to go back to the teams responsible for all the Fedora products - Server, Workstation, Atomic et. al - and see if we need to take another good look at the documents that define what those products should deliver, and the test processes we have in place to try and determine whether they deliver them.

We're also still going to be doing 'updates testing' and 'test days', I think. In fact, the biggest consequence of a world where the CI stuff works out might be that we are free to do more of those. There may be some change in what 'updates' are - it may not just be RPM packages any more - but whatever interesting forms of 'update' we wind up shipping out to people, we're still going to need to make sure they work properly, and manual testing is always going to be able to find things that automated tests miss there.

I think the question of to what extent we still have a role in 'automated testing' and what it should be is also a really interesting one. One of the angles of the 'more collaboration between RH and Fedora' bit here is that RH is now very interested in 'upstreaming' a bunch of its internal tests that it previously considered to be sort of 'RH secret sauce'. Specifically, there's a set of tests from RH's 'Platform QE' team which currently run through a pipeline using RH's Beaker test platform which we'd really like to have at least a subset of running on Fedora. So there's an open question about whether and to what extent Fedora QA would have a role in adapting those tests to Fedora and overseeing their operation. The nuts and bolts of 'make sure Fedora has the necessary systems in place to be able to run the tests at all' is going to be the job of the new 'infrastructure' team, but we may well wind up being involved in the work of adapting the tests themselves to Fedora and deciding which ones we want to run and for what purposes. In general, there is likely still going to be a requirement for 'automated testing' that isn't CI - it's still going to be necessary to test the things we build at a higher level. I don't think we can yet know exactly what requirements we'll have there, but it's something to think about and figure out as we move forward, and I think it's definitely going to be part of our job.

We may also need to reconsider how Fedora QA, and indeed Fedora as a whole, decides what is really important. Right now, there's a pretty solid process for this, but it's quite tied to the 'all the things, every six months' release cycle. For each release we decide which Fedora products are 'release blocking', and we care about those, and the bits that go into them and the tools for building them, an awful lot more than we care about anything else. This works pretty well to focus our limited resources on what's really important. But if we're going to be moving to having more and more varied 'Fedora' products with different release streams, the binary 'is it release blocking?' question doesn't really work any more. Fedora as a whole might need a better way of doing that, and QA should have a role to play in figuring that out and making sure we work out our priorities properly from it.

So there we go! I hope that was useful and thought-provoking. We've got a QA meeting coming up tomorrow (2017-02-13) at 1600 UTC where I'm hoping we can chew these topics over a bit, just to serve as an opportunity to get people thinking. Hope to see you there, or on the mailing list!

openQA and Autocloud result submission to ResultsDB

So I've just arrived back from a packed two weeks in Brno, and I'll probably have some more stuff to post soon. But let's lead with some big news!

One of the big topics at Devconf and around the RH offices was the ongoing effort to modernize both Fedora and RHEL's overall build processes to be more flexible and involve a lot more testing (or, as some people may have put it, "CI CI CI"). A lot of folks wearing a lot of hats are involved in different bits of this effort, but one thing that seems to stay constant is that ResultsDB will play a significant role.

ResultsDB started life as the result storage engine for AutoQA, and the concept and name was preserved as AutoQA was replaced by Taskotron. Its current version, however, is designed to be a scalable, capable and generic store for test results from any test system, not just Taskotron. Up until last week, though, we'd never quite got around to hooking up any other systems to it to demonstrate this.

Well, that's all changed now! In the course of three days, Jan Sedlak and I got both Fedora's openQA instance and Autocloud reporting to ResultsDB. As results come out of both those systems, fedmsg consumers take the results, process them into a common format, and forward them to ResultsDB. This means there are groups with results from both systems for the same compose together, and you'll find metadata in very similar format attached to the results from both systems. This is all deployed in production right now - the results from every daily compose from both openQA and Autocloud are being forwarded smoothly to ResultsDB.

To aid in this effort I wrote a thing we're calling resultsdb_conventions for now. I think of it as being a code representation of some 'conventions' for formatting and organizing results in ResultsDB, as well as a tool for conveniently reporting results in line with those conventions. The attraction of ResultsDB is that it's very little more than a RESTful API for a database; it enforces a pretty bare minimum in terms of required data for each result. A result must provide only a test name, an 'item' that was tested, and a status ('outcome') from a choice of four. ResultsDB allows a result to include as much more data as it likes, in the form of a freeform key:value data store, but it does not require any extra data to be provided, or impose any policy on its form.

This makes ResultsDB flexible, but also means we will need to establish conventions where appropriate to ensure related results can be conveniently located and reasoned about. resultsdb_conventions is my initial contribution to this effort, originally written just to reduce duplication between the openQA and Autocloud result submitters and ensure they used a common layout, but intended to perhaps cover far more use cases in the future.

Having this data in ResultsDB is likely to be practically useful either immediately or in the very near future, but we're also hoping it acts as a demonstration that using ResultsDB to consolidate results from multiple test sources is not only possible but quite easy. And I'm hoping resultsdb_conventions can be a starting point for a discussion and some consensus around what metadata we provide, and in what format, for various types of result. If all goes well, we're hoping to hook up manual test result submission to ResultsDB next, via the relval-ng project that's had some discussion on the QA mailing lists. Stay tuned for more on that!

The Tale Of The Two-Day, One-Character Patch

I'm feeling like writing a very long explanation of a very small change again. Some folks have told me they enjoy my attempts to detail the entire step-by-step process of debugging some somewhat complex problem, so sit back, folks, and enjoy...The Tale Of The Two-Day, One-Character Patch!

Recently we landed Python 3.6 in Fedora Rawhide. A Python version bump like that requires all Python-dependent packages in the distribution to be rebuilt. As usually happens, several packages failed to rebuild successfully, so among other work, I've been helping work through the list of failed packages and fixing them up.

Two days ago, I reached python-deap. As usual, I first simply tried a mock build of the package: sometimes it turns out we already fixed whatever had previously caused the build to fail, and simply retrying will make it work. But that wasn't the case this time.

The build failed due to build dependencies not being installable - python2-pypandoc, in this case. It turned out that this depends on pandoc-citeproc, and that wasn't installable because a new ghc build had been done without rebuilds of the set of pandoc-related packages that must be rebuilt after a ghc bump. So I rebuilt pandoc, and ghc-aeson-pretty (an updated version was needed to build an updated pandoc-citeproc which had been committed but not built), and finally pandoc-citeproc.

With that done, I could do a successful scratch build of python-deap. I tweaked the package a bit to enable the test suites - another thing I'm doing for each package I'm fixing the build of, if possible - and fired off an official build.

Now you may notice that this looks a bit odd, because all the builds for the different arches succeeded (they're green), but the overall 'State' is "failed". What's going on there? Well, if you click "Show result", you'll see this:

BuildError: The following noarch package built differently on different architectures: python-deap-doc-1.0.1-2.20160624git232ed17.fc26.noarch.rpm
rpmdiff output was:
error: cannot open Packages index using db5 - Permission denied (13)
error: cannot open Packages database in /var/lib/rpm
error: cannot open Packages database in /var/lib/rpm
removed     /usr/share/doc/python-deap/html/_images/cma_plotting_01_00.png
removed     /usr/share/doc/python-deap/html/examples/es/cma_plotting_01_00.hires.png
removed     /usr/share/doc/python-deap/html/examples/es/cma_plotting_01_00.pdf
removed     /usr/share/doc/python-deap/html/examples/es/cma_plotting_01_00.png

So, this is a good example of where background knowledge is valuable. Getting from step to step in this kind of debugging/troubleshooting process is a sort of combination of logic, knowledge and perseverance. Always try to be logical and methodical. When you start out you won't have an awful lot of knowledge, so you'll need a lot of perseverance; hopefully, the longer you go on, the more knowledge you'll pick up, and thus the less perseverance you'll need!

In this case the error is actually fairly helpful, but I also know a bit about packages (which helps) and remembered a recent mailing list discussion. Fedora allows arched packages with noarch subpackages, and this is how python-deap is set up: the main packages are arched, but there is a python-deap-docs subpackage that is noarch. We're concerned with that package here. I recalled a recent mailing list discussion of this "built differently on different architectures" error.

As discussed in that thread, we're failing a Koji check specific to this kind of package. If all the per-arch builds succeed individually, Koji will take the noarch subpackage(s) from each arch and compare them; if they're not all the same, Koji will consider this an error and fail the build. After all, the point of a noarch package is that its contents are the same for all arches and so it shouldn't matter which arch build we take the noarch subpackage from. If it comes out different on different arches, something is clearly up.

So this left me with the problem of figuring out which arch was different (it'd be nice if the Koji message actually told us...) and why. I started out just looking at the build logs for each arch and searching for 'cma_plotting'. This is actually another important thing: one of the most important approaches to have in your toolbox for this kind of work is just 'searching for significant-looking text strings'. That might be a grep or it might be a web search, but you'll probably wind up doing a lot of both. Remember good searching technique: try to find the most 'unusual' strings you can to search for, ones for which the results will be strongly correlated with your problem. This quickly told me that the problematic arch was ppc64. The 'removed' files were not present in that build, but they were present in the builds for all other arches.

So I started looking more deeply into the ppc64 build log. If you search for 'cma_plotting' in that file, you'll see the very first result is "WARNING: Exception occurred in plotting cma_plotting". That sounds bad! Below it is a long Python traceback - the text starting "Traceback (most recent call last):".

So what we have here is some kind of Python thing crashing during the build. If we quickly compare with the build logs on other arches, we don't see the same thing at all - there is no traceback in those build logs. Especially since this shows up right when the build process should be generating the files we know are the problem (the cma_plotting files, remember), we can be pretty sure this is our culprit.

Now this is a pretty big scary traceback, but we can learn some things from it quite easily. One is very important: we can see quite easily what it is that's going wrong. If we look at the end of the traceback, we see that all the last calls involve files in /usr/lib64/python2.7/site-packages/matplotlib. This means we're dealing with a Python module called matplotlib. We can quite easily associate that with the package python-matplotlib, and now we have our next suspect.

If we look a bit before the traceback, we can get a bit more general context of what's going on, though it turns out not to be very important in this case. Sometimes it is, though. In this case we can see this:

+ sphinx-build-2 doc build/html
Running Sphinx v1.5.1

Again, background knowledge comes in handy here: I happen to know that Sphinx is a tool for generating documentation. But if you didn't already know that, you should quite easily be able to find it out, by good old web search. So what's going on is the package build process is trying to generate python-deap's documentation, and that process uses this matplotlib library, and something is going very wrong - but only on ppc64, remember - in matplotlib when we try to generate one particular set of doc files.

So next I start trying to figure out what's actually going wrong in matplotlib. As I mentioned, the traceback is pretty long. This is partly just because matplotlib is big and complex, but it's more because it's a fairly rare type of Python error - an infinite recursion. You'll see the traceback ends with many, many repetitions of this line:

  File "/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py", line 861, in _get_glyph
    return self._get_glyph('rm', font_class, sym, fontsize)

followed by:

  File "/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py", line 816, in _get_glyph
    uniindex = get_unicode_index(sym, math)
  File "/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py", line 87, in get_unicode_index
    if symbol == '-':
RuntimeError: maximum recursion depth exceeded in cmp

What 'recursion' means is pretty simple: it just means that a function can call itself. A common example of where you might want to do this is if you're trying to walk a directory tree. In Python-y pseudo-code it might look a bit like this:

def read_directory(directory):
    print(directory.name)
    for entry in directory:
        if entry is file:
            print(entry.name)
        if entry is directory:
            read_directory(entry)

To deal with directories nested in other directories, the function just calls itself. The danger is if you somehow mess up when writing code like this, and it winds up in a loop, calling itself over and over and never escaping: this is 'infinite recursion'. Python, being a nice language, notices when this is going on, and bails after a certain number of recursions, which is what's happening here.

So now we know where to look in matplotlib, and what to look for. Let's go take a look! matplotlib, like most everything else in the universe these days, is in github, which is bad for ecosystem health but handy just for finding stuff. Let's go look at the function from the backtrace.

Well, this is pretty long, and maybe a bit intimidating. But an interesting thing is, we don't really need to know what this function is for - I actually still don't know precisely (according to the name it should be returning a 'glyph' - a single visual representation for a specific character from a font - but it actually returns a font, the unicode index for the glyph, the name of the glyph, the font size, and whether the glyph is italicized, for some reason). What we need to concentrate on is the question of why this function is getting in a recursion loop on one arch (ppc64) but not any others.

First let's figure out how the recursion is actually triggered - that's vital to figuring out what the next step in our chain is. The line that triggers the loop is this one:

                return self._get_glyph('rm', font_class, sym, fontsize)

That's where it calls itself. It's kinda obvious that the authors expect that call to succeed - it shouldn't run down the same logical path, but instead get to the 'success' path (the return font, uniindex, symbol_name, fontsize, slanted line at the end of the function) and thus break the loop. But on ppc64, for some reason, it doesn't.

So what's the logic path that leads us to that call, both initially and when it recurses? Well, it's down three levels of conditionals:

    if not found_symbol:
        if self.cm_fallback:
            <other path>
        else:
            if fontname in ('it', 'regular') and isinstance(self, StixFonts):
                return self._get_glyph('rm', font_class, sym, fontsize)

So we only get to this path if found_symbol is not set by the time we reach that first if, then if self.cm_fallback is not set, then if the fontname given when the function was called was 'it' or 'regular' and if the class instance this function (actually method) is a part of is an instance of the StixFonts class (or a subclass). Don't worry if we're getting a bit too technical at this point, because I did spend a bit of time looking into those last two conditions, but ultimately they turned out not to be that significant. The important one is the first one: if not found_symbol.

By this point, I'm starting to wonder if the problem is that we're failing to 'find' the symbol - in the first half of the function - when we shouldn't be. Now there are a couple of handy logical shortcuts we can take here that turned out to be rather useful. First we look at the whole logic flow of the found_symbol variable and see that it's a bit convoluted. From the start of the function, there are two different ways it can be set True - the if self.use_cmex block and then the 'fallback' if not found_symbol block after that. Then there's another block that starts if found_symbol: where it gets set back to False again, and another lookup is done:

    if found_symbol:
    (...)
        found_symbol = False
        font = self._get_font(new_fontname)
        if font is not None:
            glyphindex = font.get_char_index(uniindex)
            if glyphindex != 0:
                found_symbol = True

At first, though, we don't know if we're even hitting that block, or if we're failing to 'find' the symbol earlier on. It turns out, though, that it's easy to tell - because of this earlier block:

    if not found_symbol:
        try:
            uniindex = get_unicode_index(sym, math)
            found_symbol = True
        except ValueError:
            uniindex = ord('?')
            warn("No TeX to unicode mapping for '%s'" %
                 sym.encode('ascii', 'backslashreplace'),
                 MathTextWarning)

Basically, if we don't find the symbol there, the code logs a warning. We can see from our build log that we don't see any such warning, so we know that the code does initially succeed in finding the symbol - that is, when we get to the if found_symbol: block, found_symbol is True. That logically means that it's that block where the problem occurs - we have found_symbol going in, but where that block sets it back to False then looks it up again (after doing some kind of font substitution, I don't know why, don't care), it fails.

The other thing I noticed while poking through this code is a later warning. Remember that the infinite recursion only happens if fontname in ('it', 'regular') and isinstance(self, StixFonts)? Well, what happens if that's not the case is interesting:

            if fontname in ('it', 'regular') and isinstance(self, StixFonts):
                return self._get_glyph('rm', font_class, sym, fontsize)
            warn("Font '%s' does not have a glyph for '%s' [U+%x]" %
                 (new_fontname,
                  sym.encode('ascii', 'backslashreplace').decode('ascii'),
                  uniindex),
                 MathTextWarning)

that is, if that condition isn't satisfied, instead of calling itself, the next thing the function does is log a warning. So it occurred to me to go and see if there are any of those warnings in the build logs. And, whaddayaknow, there are four such warnings in the ppc64 build log:

/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py:866: MathTextWarning: Font 'rm' does not have a glyph for '1' [U+1d7e3]
  MathTextWarning)
/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py:867: MathTextWarning: Substituting with a dummy symbol.
  warn("Substituting with a dummy symbol.", MathTextWarning)
/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py:866: MathTextWarning: Font 'rm' does not have a glyph for '0' [U+1d7e2]
  MathTextWarning)
/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py:866: MathTextWarning: Font 'rm' does not have a glyph for '-' [U+2212]
  MathTextWarning)
/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py:866: MathTextWarning: Font 'rm' does not have a glyph for '2' [U+1d7e4]
  MathTextWarning)

but there are no such warnings in the logs for other arches. That's really rather interesting. It makes one possibility very unlikely: that we do reach the recursed call on all arches, but it fails on ppc64 and succeeds on the other arches. It's looking far more likely that the problem is the "re-discovery" bit of the function - the if found_symbol: block where it looks up the symbol again - is usually working on other arches, but failing on ppc64.

So just by looking at the logical flow of the function, particularly what happens in different conditional branches, we've actually been able to figure out quite a lot, without knowing or even caring what the function is really for. By this point, I was really focusing in on that if found_symbol: block. And that leads us to our next suspect. The most important bit in that block is where it actually decides whether to set found_symbol to True or not, here:

        font = self._get_font(new_fontname)
        if font is not None:
            glyphindex = font.get_char_index(uniindex)
            if glyphindex != 0:
                found_symbol = True

I didn't actually know whether it was failing because self._get_font didn't find anything, or because font.get_char_index returned 0. I think I just played a hunch that get_char_index was the problem, but it wouldn't be too difficult to find out by just editing the code a bit to log a message telling you whether or not font was None, and re-running the test suite.

Anyhow, I wound up looking at get_char_index, so we need to go find that. You could work backwards through the code and figure out what font is an instance of so you can find it, but that's boring: it's far quicker just to grep the damn code. If you do that, you get various results that are calls of it, then this:

src/ft2font_wrapper.cpp:const char *PyFT2Font_get_char_index__doc__ =
src/ft2font_wrapper.cpp:    "get_char_index()\n"
src/ft2font_wrapper.cpp:static PyObject *PyFT2Font_get_char_index(PyFT2Font *self, PyObject *args, PyObject *kwds)
src/ft2font_wrapper.cpp:    if (!PyArg_ParseTuple(args, "k:get_char_index", &ccode)) {
src/ft2font_wrapper.cpp:        {"get_char_index", (PyCFunction)PyFT2Font_get_char_index, METH_VARARGS, PyFT2Font_get_char_index__doc__},

Which is the point at which I started mentally buckling myself in, because now we're out of Python and into C++. Glorious C++! I should note at this point that, while I'm probably a half-decent Python coder at this point, I am still pretty awful at C(++). I may be somewhat or very wrong in anything I say about it. Corrections welcome.

So I buckled myself in and went for a look at this ft2font_wrapper.cpp thing. I've seen this kind of thing a couple of times before, so by squinting at it a bit sideways, I could more or less see that this is what Python calls an extension module: basically, it's a Python module written in C or C++. This gets done if you need to create a new built-in type, or for speed, or - as in this case - because the Python project wants to work directly with a system shared library (in this case, freetype), either because it doesn't have Python bindings or because the project doesn't want to use them for some reason.

This code pretty much provides a few classes for working with Freetype fonts. It defines a class called matplotlib.ft2font.FT2Font with a method get_char_index, and that's what the code back up in mathtext.py is dealing with: that font we were dealing with is an FT2Font instance, and we're using its get_char_index method to try and 'find' our 'symbol'.

Fortunately, this get_char_index method is actually simple enough that even I can figure out what it's doing:

static PyObject *PyFT2Font_get_char_index(PyFT2Font *self, PyObject *args, PyObject *kwds)
{
    FT_UInt index;
    FT_ULong ccode;

    if (!PyArg_ParseTuple(args, "I:get_char_index", &ccode)) {
        return NULL;
    }

    index = FT_Get_Char_Index(self->x->get_face(), ccode);

    return PyLong_FromLong(index);
}

(If you're playing along at home for MEGA BONUS POINTS, you now have all the necessary information and you can try to figure out what the bug is. If you just want me to explain it, keep reading!)

There's really not an awful lot there. It's calling FT_Get_Char_Index with a couple of args and returning the result. Not rocket science.

In fact, this seemed like a good point to start just doing a bit of experimenting to identify the precise problem, because we've reduced the problem to a very small area. So this is where I stopped just reading the code and started hacking it up to see what it did.

First I tweaked the relevant block in mathtext.py to just log the values it was feeding in and getting out:

        font = self._get_font(new_fontname)
        if font is not None:
            glyphindex = font.get_char_index(uniindex)
            warn("uniindex: %s, glyphindex: %s" % (uniindex, glyphindex))
            if glyphindex != 0:
                found_symbol = True

Sidenote: how exactly to just print something out to the console when you're building or running tests can vary quite a bit depending on the codebase in question. What I usually do is just look at how the project already does it - find some message that is being printed when you build or run the tests, and then copy that. Thus in this case we can see that the code is using this warn function (it's actually warnings.warn), and we know those messages are appearing in our build logs, so...let's just copy that.

Then I ran the test suite on both x86_64 and ppc64, and compared. This told me that the Python code was passing the same uniindex values to the C code on both x86_64 and ppc64, but getting different results back - that is, I got the same recorded uniindex values, but on x86_64 the resulting glyphindex value was always something larger than 0, but on ppc64, it was sometimes 0.

The next step should be pretty obvious: log the input and output values in the C code.

index = FT_Get_Char_Index(self->x->get_face(), ccode);
printf("ccode: %lu index: %u\n", ccode, index);

Another sidenote: one of the more annoying things with this particular issue was just being able to run the tests with modifications and see what happened. First, I needed an actual ppc64 environment to use. The awesome Patrick Uiterwijk of Fedora release engineering provided me with one. Then I built a .src.rpm of the python-matplotlib package, ran a mock build of it, and shelled into the mock environment. That gives you an environment with all the necessary build dependencies and the source and the tests all there and prepared already. Then I just copied the necessary build, install and test commands from the spec file. For a simple pure-Python module this is all usually pretty easy and you can just check the source out and do it right in your regular environment or in a virtualenv or something, but for something like matplotlib which has this C++ extension module too, it's more complex. The spec builds the code, then installs it, then runs the tests out of the source directory with PYTHONPATH=BUILDROOT/usr/lib64/python2.7/site-packages , so the code that was actually built and installed is used for the tests. When I wanted to modify the C part of matplotlib, I edited it in the source directory, then re-ran the 'build' and 'install' steps, then ran the tests; if I wanted to modify the Python part I just edited it directly in the BUILDROOT location and re-ran the tests. When I ran the tests on ppc64, I noticed that several hundred of them failed with exactly the bug we'd seen in the python-deap package build - this infinite recursion problem. Several others failed due to not being able to find the glyph, without hitting the recursion. It turned out the package maintainer had disabled the tests on ppc64, and so Fedora 24+'s python-matplotlib has been broken on ppc64 since about April).

So anyway, with that modified C code built and used to run the test suite, I finally had a smoking gun. Running this on x86_64 and ppc64, the logged ccode values were totally different. The values logged on ppc64 were huge. But as we know from the previous logging, there was no difference in the value when the Python code passed it to the C code (the uniindex value logged in the Python code).

So now I knew: the problem lay in how the C code took the value from the Python code. At this point I started figuring out how that worked. The key line is this one:

if (!PyArg_ParseTuple(args, "I:get_char_index", &ccode)) {

That PyArg_ParseTuple function is what the C code is using to read in the value that mathtext.py calls uniindex and it calls ccode, the one that's somehow being messed up on ppc64. So let's read the docs!

This is one unusual example where the Python docs, which are usually awesome, are a bit difficult, because that's a very thin description which doesn't provide the references you usually get. But all you really need to do is read up - go back to the top of the page, and you get a much more comprehensive explanation. Reading carefully through the whole page, we can see pretty much what's going on in this call. It basically means that args is expected to be a structure representing a single Python object, a number, which we will store into the C variable ccode. The tricky bit is that second arg, "I:get_char_index". This is the 'format string' that the Python page goes into a lot of helpful detail about.

As it tells us, PyArg_ParseTuple "use[s] format strings which are used to tell the function about the expected arguments...A format string consists of zero or more “format units.” A format unit describes one Python object; it is usually a single character or a parenthesized sequence of format units. With a few exceptions, a format unit that is not a parenthesized sequence normally corresponds to a single address argument to these functions." Next we get a list of the 'format units', and I is one of those:

 I (integer) [unsigned int]
    Convert a Python integer to a C unsigned int, without overflow checking.

You might also notice that the list of format units include several for converting Python integers to other things, like i for 'signed int' and h for 'short int'. This will become significant soon!

The :get_char_index bit threw me for a minute, but it's explained further down:

"A few other characters have a meaning in a format string. These may not occur inside nested parentheses. They are: ... : The list of format units ends here; the string after the colon is used as the function name in error messages (the “associated value” of the exception that PyArg_ParseTuple() raises)." So in our case here, we have only a single 'format unit' - I - and get_char_index is just a name that'll be used in any error messages this call might produce.

So now we know what this call is doing. It's saying "when some Python code calls this function, take the args it was called with and parse them into C structures so we can do stuff with them. In this case, we expect there to be just a single arg, which will be a Python integer, and we want to convert it to a C unsigned integer, and store it in the C variable ccode."

(If you're playing along at home but you didn't get it earlier, you really should be able to get it now! Hint: read up just a few lines in the C code. If not, go refresh your memory about architectures...)

And once I understood that, I realized what the problem was. Let's read up just a few lines in the C code:

FT_ULong ccode;

Unlike Python, C and C++ are 'typed languages'. That just means that all variables must be declared to be of a specific type, unlike Python variables, which you don't have to declare explicitly and which can change type any time you like. This is a variable declaration: it's simply saying "we want a variable called ccode, and it's of type FT_ULong".

If you know anything at all about C integer types, you should know what the problem is by now (you probably worked it out a few paragraphs back). But if you don't, now's a good time to learn!

There are several different types you can use for storing integers in C: short, int, long, and possibly long long (depends on your arch). This is basically all about efficiency: you can only put a small number in a short, but if you only need to store small numbers, it might be more efficient to use a short than a long. Theoretically, when you use a short the compiler will allocate less memory than when you use an int, which uses less memory again than a long, which uses less than a long long. Practically speaking some of them wind up being the same size on some platforms, but the basic idea's there.

All the types have signed and unsigned variants. The difference there is simple: signed numbers can be negative, unsigned ones can't. Say an int is big enough to let you store 101 different values: a signed int would let you store any number between -50 and +50, while an unsigned int would let you store any number between 0 and 100.

Now look at that ccode declaration again. What is its type? FT_ULong. That ULong...sounds a lot like unsigned long, right?

Yes it does! Here, have a cookie. C code often declares its own aliases for standard C types like this; we can find Freetype's in its API documentation, which I found by the cunning technique of doing a web search for FT_ULong. That finds us this handy definition: "A typedef for unsigned long."

Aaaaaaand herein lies our bug! Whew, at last. As, hopefully, you can now see, this ccode variable is declared as an unsigned long, but we're telling PyArg_ParseTuple to convert the Python object such that we can store it as an unsigned int, not an unsigned long.

But wait, you think. Why does this seem to work OK on most arches, and only fail on ppc64? Again, some of you will already know the answer, good for you, now go read something else. ;) For the rest of you, it's all about this concept called 'endianness', which you might have come across and completely failed to understand, like I did many times! But it's really pretty simple, at least if we skate over it just a bit.

Consider the number "forty-two". Here is how we write it with numerals: 42. Right? At least, that's how most humans do it, these days, unless you're a particularly hardy survivor of the fall of Rome, or something. This means we humans are 'big-endian'. If we were 'little-endian', we'd write it like this: 24. 'Big-endian' just means the most significant element comes 'first' in the representation; 'little-endian' means the most significant element comes last.

All the arches Fedora supports except for ppc64 are little-endian. On little-endian arches, this error doesn't actually cause a problem: even though we used the wrong format unit, the value winds up being correct. On (64-bit) big-endian arches, however, it does cause a problem - when you tell PyArg_ParseTuple to convert to an unsigned long, but store the result into a variable that was declared as an unsigned int, you get a completely different value (it's multiplied by 2x32). The reasons for this involve getting into a more technical understanding of little-endian vs. big-endian (we actually have to get into the icky details of how values are really represented in memory), which I'm going to skip since this post is already long enough.

But you don't really need to understand it completely, certainly not to be able to spot problems like this. All you need to know is that there are little-endian and big-endian arches, and little-endian are far more prevalent these days, so it's not unusual for low-level code to have weird bugs on big-endian arches. If something works fine on most arches but not on one or two, check if the ones where it fails are big-endian. If so, then keep a careful eye out for this kind of integer type mismatch problem, because it's very, very likely to be the cause.

So now all that remained to do was to fix the problem. And here we go, with our one character patch:

diff --git a/src/ft2font_wrapper.cpp b/src/ft2font_wrapper.cpp
index a97de68..c77dd83 100644
--- a/src/ft2font_wrapper.cpp
+++ b/src/ft2font_wrapper.cpp
@@ -971,7 +971,7 @@ static PyObject *PyFT2Font_get_char_index(PyFT2Font *self, PyObject *args, PyObj
     FT_UInt index;
     FT_ULong ccode;

-    if (!PyArg_ParseTuple(args, "I:get_char_index", &ccode)) {
+    if (!PyArg_ParseTuple(args, "k:get_char_index", &ccode)) {
         return NULL;
     }

There's something I just love about a one-character change that fixes several hundred test failures. :) As you can see, we simply change the I - the format unit for unsigned int - to k - the format unit for unsigned long. And with that, the bug is solved! I applied this change on both x86_64 and ppc64, re-built the code and re-ran the test suite, and observed that several hundred errors disappeared from the test suite on ppc64, while the x86_64 tests continued to pass.

So I was able to send that patch upstream, apply it to the Fedora package, and once the package build went through, I could finally build python-deap successfully, two days after I'd first tried it.

Bonus extra content: even though I'd fixed the python-deap problem, as I'm never able to leave well enough alone, it wound up bugging me that there were still several hundred other failures in the matplotlib test suite on ppc64. So I wound up looking into all the other failures, and finding several other similar issues, which got the failure count down to just two sets of problems that are too domain-specific for me to figure out, and actually also happen on aarch64 and ppc64le (they're not big-endian issues). So to both the people running matplotlib on ppc64...you're welcome ;)

Seriously, though, I suspect without these fixes, we might have had some odd cases where a noarch package's documentation would suddenly get messed up if the package happened to get built on a ppc64 builder.

QA protip of the day: make sure your test runner fails properly

Just when you thought you were safe...it's time for a blog post!

For the last few days I've been working on fixing Rawhide packages that failed to build as part of the Python 3.6 mass rebuild. In the course of this, I've been enabling test suites for packages where there is one, we can plausibly run it, and we weren't doing so before, because tests are great and running them during package builds is great. (And it's in the guidelines).

I've now come across two projects which have a unittest-based test script which does something like this:

#!/usr/bin/python3

class SomeTests(unittest.TestCase):
    [tests here]

def main():
    suite = unittest.TestLoader().loadTestsFromTestCase(SomeTests)
    unittest.TextTestRunner(verbosity=3).run(suite)

if __name__ == '__main__':
    main()

Now if you just run this script manually all the time and inspect its output, you'll be fine, because it'll tell you whether the tests passed or not. However, if you try and use it in any kind of automated way you're going to have trouble, because this script will always exit 0, even if some or all the tests fail. This, of course, makes it rather useless for running during a package build, because the build will never fail even if all the tests do.

If you're going to write your own test script like this (which...seriously consider if you should just rely on unittest's 'gathering' stuff instead, or use nose(2), or use pytest...), then it's really a good idea to make sure your test script actually fails if any of the tests fail. Thus:

#!/usr/bin/python3

import sys

class SomeTests(unittest.TestCase):
    [tests here]

def main():
    suite = unittest.TestLoader().loadTestsFromTestCase(SomeTests)
    ret = unittest.TextTestRunner(verbosity=3).run(suite)
    if ret.wasSuccessful():
        sys.exit()
    else:
        sys.exit("Test(s) failed!")

if __name__ == '__main__':
    main()

(note: just doing sys.exit() will exit 0; doing sys.exit('any string') prints the string and exits 1).

Packagers, look out for this kind of bear trap when packaging...if the package doesn't use a common test pattern or system but has a custom script like this, check it and make sure it behaves sanely.