Fedora 28, broken comments, QA goings on...

Adam Williamson

2018-04-27 16:40

Long time no blog, once more!

Fedora 28

So of course, the big news of the week is that Fedora 28 was signed off yesterday and will be coming out on 2018-05-01. If you examine the Fedora 28 schedule, you will observe that this was in fact the originally targeted date for the release. The earliest targeted date.

Yes. It's a Fedora release. Coming out on time. That noise you hear is the approaching meteor that will wipe out all life on Earth. You're welcome. ;)

We've always said the schedules for Fedora are really estimates and we don't consider it a problem if there's a week or two delay to fix up bugs, and that's still the case. We may well wind up slipping again for F29. But hey, it's nice to get it done "on time" just once. I did, in fact, check, and this really is the very first time a Fedora release has ever been entirely on time. Fedora 8 was close - it was only a day late, if you discount a very early draft schedule - but still, a day's a day!

There are, as always, a few bugs I really wish we'd been able to fix for the release. But that's pretty much always the case, and these are no worse than ones we've shipped before. We have to draw a line somewhere, for a distro that releases as often as Fedora. This should be another pretty solid release. My desktop and main laptop are running it already, and it's pretty fine.

Comments: Yes, They're Broken

Quick note for people who keep emailing me: yes, posting comments on this blog appears to be broken. No, I'm not particularly bothered. I actually have been meaning to convert this into an entirely static blog with no commenting for years, I just don't want to deal with Wordpress or any dynamic blog framework really any more. But I never have time to do it, as I want to include existing comments in the conversion, which isn't straightforward. I'm gonna get it done one of these days, though.

openQA news: upgrade tests for updates, aarch64 testing...

I've been doing a lot of miscellaneous stuff I haven't blogged about lately, but here's one thing I'm pretty proud of: Simo and Rob from the FreeIPA team asked if it would be possible to test whether Fedora package updates broke FreeIPA upgrades, as Simo had noticed a case where upgrading a server to Fedora 27 didn't work. We already had tests that test deploying a FreeIPA server and client on one Fedora release, then upgrading both to the next Fedora release and seeing if things still worked - but we weren't running it on updates, we only ran it on nightly composes of Branched and Rawhide. So effectively we know all the way up until a given release comes out whether upgrading works for it, but once it comes out, we didn't know if upgrading is suddenly broken by a later update.

These tests are some of the longest-running we have, so I was a bit worried about whether we'd have the resources to run them on updates, but I figured I'd go ahead and try it, and after a day or two of bashing was able to get it running in staging. After a week, staging seemed to be keeping up with the load, so I've pushed this out into production today. If you look at recent openQA update tests, like this one, you'll see an updates-server-upgrade flavor with a couple of tests in it: these are testing that installing the previous Fedora release, deploying a FreeIPA server and client, then upgrading them to the release the update is for, with the update included, works OK. I'm quite happy with that! I may extend this basic mechanism to also run the Workstation upgrade test as well. Note that these tests don't run for updates that are for the oldest current stable Fedora, as we don't support upgrades from EOL releases (and openQA doesn't keep the necessary base disk images for EOL releases around, so we actually couldn't run the tests).

Aside from that, the biggest openQA news lately is that we got the staging instance testing on aarch64. Here's the aarch64 tests for Fedora 28 Final, for instance. This isn't perfect yet - there are several spurious failures each time the tests run. I think this is because the workers are kind of overloaded, they're a bit short on RAM and especially on storage bandwidth (they each just have a single consumer-grade 7200RPM hard disk). I'm working with infra to try and improve that situation before we consider pushing this into production.

Other QA goings on

One thing that's been quite pleasant for me lately is I'm no longer trying to do quite so much of...everything (and inevitably missing some things). Sumantro and coremodule have done a great job of taking over Test Day co-ordination and some other community-ish tasks, so I don't have to worry about trying to keep up with those any more. Sumantro has been bringing a whole bundle of energy to organizing Test Days and onboarding events, so we've had lots more Test Days these last two cycles, and more people to take part in them, which is great. We've also had more folks taking part in validation testing. It's made life a lot less stressful around here!

I've been mostly concentrating on co-ordinating things like release validation testing, doing a bit of mentoring for the newer team members, and keeping openQA ticking over. It's nice to be able to focus a bit more.

Linux kernel 4.13 and SMB protocol version fun

Adam Williamson

2017-11-03 18:40

There's been a rather interesting change in the Linux kernel recently, which may affect you if you're mounting network drives using SMB (the Windows native protocol, occasionally also called CIFS).

There have been several versions of the protocol - Wikipedia has a good writeup. Both servers and clients may support different versions; when accessing a shared resource, the client tells the server which protocol version it wants to use, and if the server supports that version then everyone's happy and the access goes ahead; if the server doesn't support that version, you get an error and no-one's happy.

Up until kernel 4.13, the kernel's default SMB protocol version was 1.0. So when you mount an SMB share, if you don't explicitly specify a protocol version with the vers= mount option, with kernel 4.12 or earlier, SMB 1.0 will be used.

With kernel 4.13, the default protocol version is changed to 3.0. So now, when mounting SMB mounts that don't explicitly specify a version, your system will request 3.0.

As I understand it, the main reason for this is security: SMB 3.0 is considerably more secure as a protocol than 1.0. Microsoft has been gradually trying to push Windows users towards later versions of the protocol over the last few releases.

Kernel 4.13 has been released as an update for Fedora 25 and Fedora 26, so users of those Fedora releases will hit this change when updating the kernel. Fedora 27 comes with kernel 4.13 out of the box.

Obviously, this comes with some compatibility consequences. If the server providing the share is running Windows 8 or later, you should be fine. However, in other cases, you may find your SMB mount suddenly fails after the kernel update. Older versions of Windows do not support SMB 3.0.

Samba added SMB 3.0 support in version 4.2, at least according to this page, so mounts provided by earlier Samba versions similarly will not work.

If your server is a NAS, it may or may not support SMB 3.0. My NAS is a Thecus N5550, so I know that for ThecusOS 5-based NASes, firmware version 2.06.02.10 added SMB 3.0 support. However, it's not enabled by default; you have to log into the admin UI, go to Network Service, select Samba/CIFS, and set 'SMB Max Protocol' to 3. Note that with this update, the default SMB minimum version is set to 2, so the NAS will no longer support 1.0 - you can change the minimum version to 'NT1' if you have a client which cannot do 2 or 3, though.

If you know information about SMB protocol support for any other NAS brand or other common SMB server of any kind, please post a comment and I'll add it to this post.

If you get caught out by this, the best solution is to somehow update the server end of your setup so that it supports SMB 3.0. However, if you can't do that, you can use the vers mount option. Use the highest version that works - 2.x isn't as good as 3.0, but better than 1.0. The available choices are documented in man mount.cifs; at present they are 1.0, 2.0, 2.1 and 3.0.

Flock 2017: trip report

Adam Williamson

2017-09-15 20:56

Better late than never, here's my report from Flock 2017!

Thanks to my excellent foresight in the areas of 'being white' and 'being Canadian' I had no particular trouble getting through security / immigration, which was nice. The venue was kinda interesting - the whole town had this very specific flavor that seems to be shared among slightly second-class seaside towns the world over. Blackpool, White Rock or Hyannis, there's something about them all...but the rooms were fairly clean, the hot water worked, the power worked, and the wifi worked fairly well for a conference, so all the important stuff was OK. Hyannis seriously needs to discover the crosswalk, though - I nearly got killed four times travelling about 100 yards from the hotel to a Subway right across the street and back. Unfortunately the 'street' was a large rotary with exactly zero accommodations for pedestrians...

Attendance seemed a bit thinner than usual, and quite heavily Red Hat-y; I've heard different reasons for this, from budget issues to Trump-related visa / immigration issues. It was a shame. There were definitely still enough people to make the event worthwhile, but it felt like some groups who would normally be there just weren't.

From the QA team we had myself, Tim Flink, Sumantro Mukherjee and Lukas Brabec. We got some in-person planning / discussion done, of course, and had a team dinner. It was particularly nice to be in the same place as Sumantro for a while, as usually our time zones are awful, he gets to the office right when I'm going to bed - so we were able to talk over a lot of stuff and agree on quite a list of future projects.

The talks, as usual, were generally very practical, focused and useful - one of the nicest things about Flock is it's a very low-BS conference. I managed to do some catch-up on modularity plans and status by following the track of modularity talks on Thursday. Aside from that, some of the talks I saw included the Hubs status update, Stef's dist-git tests talk, the Greenwave session, the Bodhi hackfest, Sumantro's kernel testing session, and a few others.

I gave a talk on how packagers can work with our automated test systems. As always seems to be the case I got scheduled very early in the conference, and again as always seems to be the case, I wound up writing my talk about an hour before giving it. Which was especially fun because while I still had about ten slides to write, my laptop starting suffering from a rather odd firmware bug which caused it to get stuck at the lowest possible CPU speed. Pro tip: LibreOffice does not like running at 400MHz. So I wasn't entirely as prepared as I could have been, but I think it went OK. I had the usual thing where, once I reached the end of the talk, I realized how I should have started it, but never mind. If I ever get to give the talk again, I'll tweak it. As a footnote, Peter Jones - being Peter Jones - naturally had all the tools and the know-how necessary to take my laptop apart and disconnect the battery, which turned out to be the only possible way to clear the CPU-throttling firmware state, so thanks very much to him for that!

As usual, though, the most productive thing about the conference was just being in the same place at the same time as lots of the folks who really make stuff happen in Fedora, and being able to work on things in real time, make plans, and pick brains. So I spent quite a lot of time bouncing around between Kevin Fenzi, Dennis Gilmore, and Peter Jones, trying to fix up Fedora 27 and Rawhide composes; we got an awful lot of bugs solved during the week. I got to talk to Ralph Bean, Pingou, Randy Barlow, Pengfei Jia, Dan Callaghan, Ryan Lerch, Jeremy Cline and various others about Bodhi, Pagure, Greenwave and various other key bits of current and future infrastructure; this was very useful in planning how we're going to move forward with compose gating and a few other things. In the kernel testing session, Sumantro, Laura Abbott and myself came up with a plan to run regular Test Days around kernel rebases for stable releases, which should help reduce the amount of issues caused by those rebases.

We started working on a 'rerun test' button for automated tests in Bodhi during the Bodhi hackfest; this is still a work in progress but it's going in interesting directions.

PSA: If you had dnf-automatic enabled and updated to Fedora 26, it probably stopped working

Adam Williamson

2017-09-14 19:34

So the other day I noticed this rather unfortunate bug on one of my servers.

Fedora 26 included a jump from DNF 1.x to DNF 2.x. It seems that DNF 2.x came with a poorly-documented change to the implementation of dnf-automatic, the tool it provides for automatically notifying of, downloading and/or installing updates.

Simply put: if you had enabled dnf-automatic in Fedora 25 or earlier, using the standard mechanism it provided - edit /etc/dnf/automatic.conf to configure the behaviour you want, and run systemctl enable dnf-automatic.timer - then you upgraded to Fedora 26, then it probably just stopped working entirely. If you were relying on it to install updates for you...it probably hasn't been. You can read the full details on why this is the case in the bug report.

We've now fixed this by sending out an update to dnf which should restore compatibility with the DNF 1.x implementation of dnf-automatic, by restoring dnf-automatic.service and dnf-automatic.timer (which function just as they did before) while preserving the new mechanisms introduced in DNF 2.x (the function-specific timers and services). But of course, you'll have to install this update manually on any systems which need it. So if you do have any F26 systems where you're expecting dnf-automatic to work...you probably want to log into them and run 'dnf update' manually to get the fixed dnf.

PSA ends!

A modest proposal

Adam Williamson

2017-09-07 11:01

                                                       PROPOSED STANDARD
                                                            Errata Exist

Internet Engineering Task Force (IETF)                   Adam Williamson
Request for Comments: 9999                                       Red Hat
Updates: 7159                                             September 2017
Category: Standards Track
ISSN: 9999-9999


     Let Me Put a Fucking Comma There, Goddamnit, JSON

Abstract

   Seriously, JSON, for the love of all that is fucking holy, let me
   end a series of items with a fucking comma.

Fedora 26 Upgrade Test Day tomorrow (2017-06-30)!

Adam Williamson

2017-06-29 14:07

It's that time again: we have another test day coming up! Tomorrow (Friday 2017-06-30) will be Fedora 26 Upgrade Test Day. As the name might suggest, we'll be testing upgrades to Fedora 26. It'd be great to have coverage of as many configurations and architectures as possible, so please, if you have a bit of spare time and some kind of environment to which you can install Fedora 24 or 25 and test upgrading to Fedora 26, come out and help test!

The Test Day page contains all the instructions you need to run the tests and send along your results. As always, the event is in #fedora-test-day on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

PSA: RPM database issues after update to libdb-5.3.28-21 on Fedora 24 and Fedora 25

Adam Williamson

2017-06-09 14:12

Hi there, folks!

This is an important PSA for Fedora 24, 25 and 26 (pre-release) users. tl;dr version: if you recently updated and got some kind of error or crash and now you're getting RPM database errors, you need to do the old reliable RPM database fix dance:

# rm -f /var/lib/rpm/__db*
# rpm --rebuilddb

and all should be well again. We do apologize for this.

Longer version: there's a rather subtle and tricky bug in libdb (the database that RPM uses) which has been causing problems with upgrades from Fedora 24/25 to Fedora 26. The developers have made a few attempts to fix this, and testing this week had indicated that the most recent attempt - libdb-5.3.28-21 - was working well. We believed the fix needed to be applied both on the 'from' and the 'to' end of any affected transaction, so we went ahead and sent the -21 update out to Fedora 24, 25 and 26.

Unfortunately it now seems like -21 may still have bugs that were not found in the testing; in the last few hours several people have reported that they hit some kind of crash during an update involving libdb -21, and subsequently there was a problem with their RPM database.

While we investigate and figure out what to do about fixing this properly, in the short term, if you're affected, just doing the old "rebuild the RPM database" trick seems to resolve the problem:

# rm -f /var/lib/rpm/__db*
# rpm --rebuilddb

EDIT: Update 2017-06-13: We briefly sent a -22 build to updates-testing for 24, 25 and 26 with the fixes reverted. It turns out that updating from -21 to -22 can, again, cause the same kinds of problem, which can be resolved in the same way. We've removed the -22 update now and are sticking with -21, and will just be advising affected people to rebuild their databases if they hit database issues. Note that if you updated from -21 to -22 while it was in updates-testing, that update may also have caused database issues, which can be resolved in the same way; and when you update from -22 to -23 or later, the same may happen again.

It's unfortunate that we have to break that one out of cold storage (I hadn't had to do it for so long I'd almost forgotten it...), but it should at least get you back up and working for now.

We do apologize sincerely for this mess, and we'll try and do all we can to fix it up ASAP.

Downtime

Adam Williamson

2017-05-29 01:17

Hi folks! Happyassassin Towers is being physically relocated tomorrow, so there's gonna be some downtime (for this site, for email, and...for me). Hang tight!

LinuxFest Northwest report

Adam Williamson

2017-05-09 19:17

EDIT: recording link added!

Hi folks!

This weekend was LinuxFest Northwest 2017, and as usual I was down in Bellingham to attend it. Had a good time, again as usual. Luckily I got to do my talk first thing and get it out of the way. Here's a recording, and here's the slide deck. It was a general talk on Fedora's past, present and future.

I saw several other good talks, including Bryan Lunduke's 'Lunduke Hour Live' featuring a great discussion with John Sullivan of the Free Software Foundation. I also saw the openSUSE 101 talk he did with James Mason - it was quite interesting to compare and contrast the openSUSE organization with Fedora's. Together with James and an Ubuntu developer, I formed a heckler's row at Kevin Burkeland's Linux 102 talk on choosing a distribution; it was actually a great talk that was pretty well though-through and had nice things to say about Fedora and openSUSE, so our heckling was sadly pre-empted.

I spent a few hours working on the booth too, but as usual the Jeffs Sandys and Fitzmaurice were the real booth heroes, so thanks once more to them.

The trivia event on Saturday night was pretty fun (and our team, The Unholy Alliance (of SUSE and Fedora folks) won with only minor cheating!). My now-traditional Sunday afternoon board gaming with Jakob Perry and co. was also fun (and I managed not to come last...)

Got to chat with Jesse Keating, Brian Lane, Laura Abbott (briefly - hope your voice is recovered by now!) and many other fine folks too. It was also really nice to hear from a whole bunch of different people that they tried out a recent Fedora release and really liked it - almost feels like we're doing something right!

If I promised you something at the conference and I don't get in touch by the end of this week, please do give me a poke and remind me, I probably forgot...

Automated non-critical path update functional testing for Fedora

Adam Williamson

2017-04-28 16:06

Yep, this here is a sequel to my most recent best-seller, Automated critical path update functional testing for Fedora :)

When I first thought about running update tests with openQA, I wasn't actually thinking about testing critical path packages. I just made that the first implementation because it was easy. But I first thought about doing it when we added the FreeIPA tests to openQA - it seemed pretty obvious that it'd be handy to run the same tests on FreeIPA-related updates as well as running them on the nightly development release composes. So all along, I was planning to come up with a way to do that too.

Funnily enough, right after I push out the critpath update testing stuff, a FreeIPA-related update that broke FreeIPA showed up, and Stephen Gallagher poked me on IRC and said "hey, it sure would be nice if we could run the openQA tests on FreeIPA-related updates!", so I said "funny you should ask..."

I bumped the topic up my todo list a bit, and wrote it that afternoon, and now it's deployed in production. For now, it's pretty simple: we just have a hand-written list of packages that we want to run some of the update tests for, whenever an update shows up with one of those packages in it. Simple enough, but it works: whenever an update containing one of those packages is submitted or edited, the server update tests (including the FreeIPA tests) will get run, and the results will be visible in Bodhi.

Here's a run on the staging instance that was triggered using the new code; since I sent it to the production instance no relevant updates have been submitted or edited, but it should work just the same there. So from now on whenever our FreeIPA-ish overlords submit an update, we'll get an idea of whether it breaks everything right away.

We can extend this system to other packages, but I couldn't think of any (besides postgresql, which I threw in there) which would really benefit from the current update tests but aren't already in the critical path (all the important bits of GNOME or in the critical path, for example, so all the desktop update tests get run on all GNOME updates already). If you can think of any, go ahead and let us know.