Flock 2015 report, and Fedora nightly compose testing

Adam Williamson

2015-08-21 16:44

Hi, folks! I've been waiting to write my post-Flock report until I had some fun stuff to show off, because that's more exciting than just a bunch of 'I went to this talk and then talked to this person', right?

Fedora nightly compose testing

So let me get to the shiny first! Without further ado:

Cool, right? That's what I've been working on this whole week since Flock. All the bits are now basically in place such that, each night, openQA will run on the Branched and Rawhide nightly composes when they're done, and when openQA is done, the compose reports will be mailed out.

Flock report

The details behind that get quite long, so before I hit that, here's a quick round-up of other stuff I did at Flock! I'm not going to cover the talks and sessions many others have already blogged about (the keynotes, etc.) as it seems redundant, but I'll mention some stuff that hasn't really been covered yet.

Josef ran a workshop on getting started with openQA. It was a bit tricky, though, due to poor networking on site; the people trying to follow along and deploy their own Docker-based openQA instances couldn't quite get all the way. So we turned the last bit of the talk into a live demo using my openQA instance instead, and created a new test case LIVE ON STAGE. We didn't quite get it all the way done before getting kicked out by a wedding party, but I finished it up shortly after the session. Josef did a great job of explaining the basics of setting up openQA and creating tests, and I hope we'll have a few more people following the openQA stuff now.

Mike McLean did a great talk on Koji 2.0, which has been kinda under the radar (at least for me) compared to Bodhi 2, but sounds like it'll come with a lot of really significant improvements and a better design. As someone who's spent a lot of time staring at kojihub.py lately, I can only say it'd be welcome...

Denise Dumas gave the now-traditional What Red Hat Wants talk, which I'm really glad is happening now. I'm totally behind the idea that we're up-front about the relationship between Red Hat and Fedora, instead of some silly arrangement where Red Hat pretends Fedora just 'happens' and is a totally community-based distro; it's much better for RH to be saying a couple of years in advance 'hey, this is where we'd like to see things going', rather than every so often a bunch of Features/Changes 'mysteriously' appearing four months out from a release and lots of people suddenly caring a lot about them (but it just all being a BIG COINCIDENCE!)

Paul Frields did a nice talk on working remotely, which had a lot of great ideas that I don't do at all (hi Paul, it's 4:30pm and I'm writing this in my dressing gown...) - but it was great to compare notes with a bunch of other folks and think about other ways of doing things.

I did a lightning talk on Fedlet, showing it off running and talking a bit about what Fedlet involves and how well (or not) it runs. Folks seemed interested, and a few people came by to play with my fedlet afterwards.

Stephen Gallagher ran a rolekit hackfest. I was hoping to use it to come up with an openQA role, but failed for a couple of reasons: Stephen doesn't recommend creating new roles right now as the format is likely to change a lot quite soon, and since I last worked on the package openQA has added a few more dependencies which need packaging. But I did manage to move forward with work on the package a bit, which was useful. In the session Stephen explained the rolekit design and current state to people, and talked about various work that needs doing on it; hopefully he'll get some more help with it soon!

Of course, as always, there was lots of hallway track and social stuff. We had a couple of excellent poker games - good to see the FUDCon/Flock poker tradition continues strong - and played some Exploding Kittens, which is a lot of fun. My favourite bit is the NOPE cards. As many others have said, the Strong Museum was awesome - got to play a bunch of pinball, and see Will Wright's notebooks(!) and John Romero's Apple ][(!!!!).

Fedora compose testing: development details and The Future

So, back to the 'compose CI' stuff I spent a lot of time talking about/working on!

A lot of what I did at Flock centred around the big topic you can call 'CI for Fedora'. We still have lots of plans afoot for big, serious test and task automation based on Taskotron, which is now getting really close to the point where you'll see a lot more cool stuff using it. But in the meantime, the 'skunkworks' openQA project we spun up during the Fedora 22 cycle has grown quite a bit, and the fedfind project I mostly built to back openQA has grown quite a lot of interesting capabilities.

So while we were talking about properly engineered plans for the future, I realized I could probably hack together some stupidly-engineered stuff that would work right now! In Kevin Fenzi's Rawhide session I threw out a few ideas and then figured that, hell, I should just do them.

So I started out by teaching fedfind some new tricks. It can now 'diff' two releases: that is, it can tell you what images are in one, but not the other. It can also check a release for 'expected' images - basically it has some knowledge about what images we'd most want to be present all the time, and it can tell you if any are missing. (FIXME: I didn't know which of the Cloud images were the most important, so right now it has no 'expected' Cloud images: if some Cloud-y people want to tell me which images are most important, I can add them).

Then I wrote a little script called check-compose which produces a handy report from that information. It also looks for openQA tests for the compose it's checking, and includes a list of failures if it finds any. It can email the report and also write the results in JSON format (which seemed like a good idea in case we want to look back at them in some programmatic way later on). The 'compose check reports' that have been showing up this week (and that I linked above) are the output of the script.

I had all of that basically done by Tuesday, so what have I been wasting the rest of my week on? Read on!

What was missing was the 'C' part of 'CI'. There was nothing that would actually run the compose report at appropriate times, and we weren't actually running openQA tests nightly. For the past few days I've been kind of faking things up by manually kicking off openQA jobs and firing off the compose report when they're done. This kind of mechanical Turk CI doesn't really work in the long run! So for the last few days I've worked on that.

We were not actually scheduling nightly openQA runs at all. The openQA trigger script has an all mode which is intended to do that, but we weren't running it. I suggested we turn it back on, but I also wanted to fix one big problem it had: it didn't know whether the composes were actually done. It just got today's date and tried to run on the nightlies for it. If they weren't actually done whenever the script ran, you got no tests.

This definitely hooks in with one of the big topics at Flock: Pungi 4, which is the pending major revision. Pungi is the tool which runs Fedora composes. Well, that's not quite right: there's actually a couple of releng scripts which produce the composes (the first of those is for nightlies, the second is for TCs/RCs). They run pungi and do lots of other stuff too, because currently pungi only actually does some of the work involved in a compose (a lot of the images are just built by the trigger scripts firing off Koji tasks and other...stuff). The current revision of the compose process is something of a mess (it's grown chaotically as we added ARM images and Cloud images and Docker images and Atomic images and flavors and all the rest of it). With the Pungi 4 revision and associated changes to the releng process, it should be trivial to follow the compose process.

Right now, though, it isn't. Nightly composes and TC/RC composes are very different. TCs/RCs don't emit information on their progress really at all. Nightlies emit some fedmsg signals, but crucially, there's no signal when the Koji builds complete: you get a signal when they start, but not when they're done.

So it was time to teach fedfind some new tricks! I decided not to go the fedmsg route yet since it's not sufficient at present. Instead I taught it to tell if composes are complete in lower-tech ways. For the Pungi part of the process it looks for a file the script creates when it's done. For Koji tasks, it finds all the Koji tasks that looks like they're a part of this nightly, and only considers the nightly 'done' when there are at least some tasks (so it doesn't report 'done' before the process starts at all) and none of the tasks is 'open' (meaning running or not yet started).

So now we could make the openQA trigger script or the compose-check script wait for a compose to actually exist before running against it! Great. Only now I had a different problem: the openQA trigger script was set up to run for both nightlies. This is fine if it's not waiting - it just goes ahead and fires one, then the other. But how to make it work with waiting?

This one had to go through a couple of revisions. My first thought was "I have a problem. I know! I'll use threads", and we all know how that joke goes. Sure enough, all three of the revisions of this approach (using threading, multiprocessing and multiprocessing.dummy) turned out to have problems. I eventually decided it wasn't worth carrying on fighting with that, and came up with some different approaches. One is a low-tech round-robin waiting approach, where the trigger script alternates between checking for Branched and Rawhide. The other is even simpler: by just adding a few capabilities to the mode where the trigger runs on a single compose, we can simply schedule two separate runs of that mode each night, one for Rawhide, one for Branched. That keeps the code simple and means either one can get all the way through the 'find compose, schedule jobs, run jobs, run compose report' process without waiting for the other.

And that, finally, is about where we're at right now! I'm hoping one or the other openQA change will be approved on Monday and then we can have this whole process running unattended each night - which will more or less finally implement some more of the near-legendary is Rawhide broken? proposal. Up till then I'll keep running the compose reports by hand.

Along the way I did some other messing around in fedfind, mostly to do with optimizing how it does Koji queries (and fixing some bugs). For all of a day or so, it used multiprocessing to run queries in parallel; I decided multithreading just wasn't worth it for the moderate performance increase, though, so I switched to using a batched query mode provided by xmlrpclib, which speeds things up a little less but keeps the code simpler. I also implemented a query cache, and spent an entire goddamn afternoon coming up with a reasonable way to make it handle fuzzy matches (when e.g. we run a query for 'all open or successful tasks', then run a query for 'all successful live CD tasks', we can derive the results for the latter from the former and not waste time talking to the server again). But I got there in the end, I think.

It was quite a lot of work, in the end, but I'm pretty happy with the result. I'm really, really looking forward to the releng improvements, though. fedfind is more or less just the things releng is aiming to do, only implemented (unavoidably) stupidly and from the wrong end. As I understand it, releng's medium-term goals are:

all composes to contain sufficient metadata on what's actually in them
compose processes for nightlies to be the same as that for TCs/RCs
compose process to notify properly at all stages via fedmsg
ComposeDB to track what composes actually exist and where they are

right now we don't really have any of those things, and so fedfind exists to reconstruct all that information painfully, from the other end. It will definitely be a relief when we can get all that information out of sane systems, and I don't have to maintain a crazy ball of magic knowledge, Koji queries and rsync scrapes any longer. For now, though, the whole crazy ball of wax seems to actually work. I'm really glad that folks like Kevin, Dennis, Peter, Ralph, Adam and others are all thinking down the same general lines: I'm hopeful that with Pungi, ComposeDB (when it happens), and further work on Taskotron and openQA and even my stupid little scripts, we'll have continuously (see what I did there?!) better stories to tell as we move on for the next few releases.