Converting fedmsg consumers to fedora-messaging

So in case you hadn't heard, the Fedora infrastructure team is currently trying to nudge people in the direction of moving from fedmsg to fedora-messaging.

Fedmsg is the Fedora project-wide messaging bus we've had since 2012. It backs FMN / Fedora Notifications and Badges, and is used extensively within Fedora infrastructure for the general purpose of "have this one system do something whenever this other system does something else". For instance, openQA job scheduling and result reporting are both powered by fedmsg.

Over time, though, there have turned out to be a few issues with fedmsg. It has a few awkward design quirks, but most significantly, it's designed such that message delivery can never be guaranteed. In practice it's very reliable and messages almost always are delivered, but for building critical systems like Rawhide package gating, the infrastructure team decided we really needed a system where message delivery can be formally guaranteed.

There was initially an idea to build a sort of extension to fedmsg allowing for message delivery to be guaranteed, but in the end it was decided instead to replace fedmsg with a new AMQP-based system called fedora-messaging. At present both fedmsg and fedora-messaging are live and there are bridges in both directions: all messages published as fedmsgs are republished as fedora-messaging messages by a 0MQ->AMQP bridge, and all messages published as fedora-messaging messages are republished as fedmsgs by an AMQP->0MQ bridge. This is intended to ease the migration process by letting you migrate a publisher or consumer of fedmsgs to fedora-messaging at any time without worrying about whether the corresponding consumers and/or publishers have also been migrated.

This is just the sort of project I usually work on in the 'quiet time' after one release comes out and before the next one really kicks into high gear, so since Fedora 30 just came out, last week I started converting the openQA fedmsg consumers to fedora-messaging. Here's a quick write-up of the process and some of the issues I found along the way!

I found these three pages in the fedora-messaging docs to be the most useful:

  1. Consumers
  2. Messages
  3. Configuration (especially the 'consumer-config' part)

Another important bit you might need are the sample config files for the production broker and stable broker.

All the fedmsg consumers I wrote followed this approach, where you essentially write consumer classes and register them as entry points in the project's setup.py. Once the project is installed, the fedmsg-hub service provided by fedmsg runs all these registered consumers (as long as a configuration setting is set to turn them on).

This exact pattern does not exist in fedora-messaging - there is no hub service. But fedora-messaging does provide a somewhat-similar pattern which is the natural migration path for this type of consumer. In this approach you still have consumer classes, but instead of registering them as entry points, you write configuration files for them and place them in /etc/fedora-messaging. You can then run an instantiated systemd service that runs fedora-messaging consume with the configuration file you created.

So to put it all together with a specific example: to schedule openQA jobs, we had a fedmsg consumer class called OpenQAScheduler which was registered as a moksha.consumer called fedora_openqa.scheduler.prod in setup.py, and had a config_key named "fedora_openqa.scheduler.prod.enabled". As long as a config file in /etc/fedmsg.d contained 'fedora_openqa.scheduler.prod.enabled': True, the fedmsg-hub service then ran this consumer. The consumer class itself defined what messages it would subscribe to, using its topic attribute.

In a fedora-messaging world, the OpenQAScheduler class is tweaked a bit to handle an AMQP-style message, and the entrypoint in setup.py and the config_key in the class are removed. Instead, we create a configuration file /etc/fedora-messaging/fedora_openqa_scheduler.toml and enable and start the fm-consumer@fedora_openqa_scheduler.service systemd service. Note that all the necessary bits for this are shipped in the fedora-messaging package, so you need that package installed on the system where the consumer will run.

That configuration file looks pretty much like the sample I put in the repository. This is based on the sample files I mentioned above.

The amqp_url specifies which AMQP broker to connect to and what username to use: in this sample we're connecting to the production Fedora broker and using the public 'fedora' identity. The callback specifies the Python path to the consumer callback class (our OpenQAScheduler class). The [tls] section points to the CA certificate, certificate and private key to be used for authenticating with the broker: since we're using the public 'fedora' identity, these are the files shipped in the fedora-messaging package itself which let you authenticate as that identity. For production use, I think the intent is that you request a separate identity from Fedora infra (who will generate certs and keys for it) and use that instead - so you'd change the amqp_url and the paths in the [tls] section appropriately.

The other key things you have to set are the queue name - which appears twice in the sample file as 00000000-0000-0000-0000-000000000000, for each consumer you are supposed to generate a random UUID with uuidgen and use that as the queue name, each consumer should have its own queue - and the routing_keys in the [[bindings]] section. Those are the topics the consumer will subscribe to - unlike in the fedmsg system, this is set in configuration rather than in the consumer class itself. Another thing you may wish to take advantage of is the consumer_config section: this is basically a freeform configuration store that the consumer class can read settings from. So you can have multiple configuration files that run the same consumer class but with different settings - you might well have different 'production' and 'staging' configurations. We do indeed use this for the openQA job scheduler consumer: we use a setting in this consumer_config section to specify the hostname of the openQA instance to connect to.

So, what needs changing in the actual consumer class itself? For me, there wasn't a lot. For a start, the class should now just inherit from object - there is no base class for consumers in the fedora-messaging world, there's no equivalent to fedmsg.consumers.FedmsgConsumer. You can remove things like the topic attribute (that's now set in configuration) and validate_signatures. You may want to set up a init, which is a good place to read in settings from consumer_config and set up a logger (more on logging in a bit). The method for actually reading a message should be named call() (so yes, fedora-messaging just calls the consumer instance itself on the message, rather than explicitly calling one of its methods). And the message object itself the method receives is slightly different: it will be an instance of fedora_messaging.api.Message or a subclass of it, not just a dict. The topic, body and other bits of the message are available as attributes, not dict items. So instead of message['topic'], you'd use message.topic. The message body is message.body.

Here I ran into a significant wrinkle. If you're consuming a native fedora-messaging message, the message.body will be the actual body of the message. However, if you're consuming a message that was published as a fedmsg and has been republished by the fedmsg->fedora-messaging bridge, message.body won't be what you'd probably expect. Looking at an example fedmsg, we'd probably expect the message.body of the converted fedora-messaging message to be just the msg dict, right? Just a dict with keys repo and agent. However, at present, the bridge actually publishes the entire fedmsg as the message.body - what you get as message.body is that whole dict. To get to the 'true' body, you have to take message.body['msg']. This is a problem because whenever the publisher is converted to fedora-messaging, there won't be a message.body['msg'] any more, and your consumer will likely break. It seems that the bridge's behavior here will likely be changed soon, but for now, this is a bit of a problem.

Once I figured this out, I wrote a little helper function called _find_true_body to fudge around this issue. You are welcome to steal it for your own use if you like. It should always find the 'true' body of any message your consumer receives, whether it's native or converted, and it will work when the bridge is fixed in future too so you won't need to update your consumer when that happens (though later on down the road it'll be safe to just get rid of the function and use message.body directly).

Those things, plus rejigging the logging a bit, were all I needed to do to convert my consumers - it wasn't really that much work in the end.

To dig into logging a bit more: fedmsg consumer class instances had a log() method you could use to send log messages, you didn't have to set up your own logging infrastructure. (Although a problem of this system was that it gave no indication which consumer a log message came from). fedora-messaging does not have this. If you want a consumer to log, you have to set up the logging infrastructure within the consumer, and tweak the configuration file a bit.

The pattern I chose was to import logging and then init a logger instance for each consumer class in its init(), like this:

self.logger = logging.getLogger(self.__class__.__name__)

Then you can log messages with self.logger.info("message") or whatever. I thought that would be all I'd need, but actually, if you just do that, there's nothing set up to actually receive the messages and log them anywhere. So you have to add a bit to the TOML config file that looks like this:

[log_config.loggers.OpenQAScheduler]
level = "INFO"
propagate = false
handlers = ["console"]

the OpenQAScheduler there is the class name; change it to the actual name of the consumer class. That will have the messages logged to the console, which - when you run the consumer as a systemd service - means they wind up in the system journal, which was enough for me. You can also configure a handler to send email alerts, for instance, if you like - you can see an example of this in Bodhi's config file.

One other wrinkle I ran into was with authenticating to the staging broker. The sample configuration file has the right URL and [tls] section for this, but the files referenced in the [tls] section aren't actually in the fedora-messaging package. To successfully connect to the staging broker, as fedora.stg, you need to grab the necessary files from the fedora-messaging git repo and place them into /etc/fedora-messaging.

To see the whole of the changes I had to make to the openQA consumers, you can look at the commits on the fedora-messaging branch of the repo and also this set of commits to the Fedora infra ansible repo.

Comments

No comments.