Release early, release often, release every 4h!

14 August 2013 · 813 words · 4 minute read

devel · libre · PU · quality · ubuntu

It's been a long time I didn't talk about our daily release process on this blog.

For those who are you not aware about it, this is what enables us to release continuously most of the components we, as the ubuntu community, are upstream for, to get into the baseline.

Sometimes, releases can feel like pushing huge rocks

Historic

Some quick stats since the system is in place (nearly since last December for full production):

we are releasing as of now 245 components to distro every day (it means, everytime a meaningfull change is in any of those 245 components, we will try to release it). Those are segregated in 22 stacks.
this created 309 uploads in raring
we are now at near 800 uploads in saucy (and the exact number is changing hour after hour)

Those numbers are even without counting feature branches (temporary forks of a set of branches) and SRUs.

Getting further

However, seeing the number of transitions we have everyday, it seems that we needed to be even faster. I was challenged by Rick and Alexander to speed up the process, even if it was always possible to run manually some part of it^[1]. I'm happy to announce that now, we are capable of releasing every 4h to distro! Time for a branch proposed to trunk to the distro is drastically reduced thanks to this. But we still took great care to keep the same safety net though with tests running, ABI handling, and not pushing every commit to distro.

This new process was in beta since last Thursday and now that everyone is briefed about it, it's time for this to get in official production time!

Modification to the process

For this to succeed, some modifications to the whole process have been put in place:

now that we release every 4 hours, it means we need a production-like team always looking at the results. That was put in place with the ~ubuntu-unity team, and the schedule is publically available here.
the consequence is that we have no more "stack ownership", everyone in the team is responsible for the whole set now.
it means as well that upstream now have a window of 4 hours before the "tick" (look at the cross on the schedule) to push stuff in different trunks in a coherence piece rather than once a day, before 00 UTC. It's the natural pressure between speed versus safety for big transitions.
better communication and tracking were needed (especially as the production is looked after by different people along the day). We needed as well to ensure everything is available for upstreams to know where their code is, what bugs affects them and so on… So now, everytime there is an issue, a bug is opened, upstream is pinged about it and we write about those on that tab. We escalate after 3 days if nothing is fixed by then.
we will reduce as much as possible "manual on demand rebuild" as the next run will fix it.

Also, I wanted that to not become a burden for our team, so some technical changes have been put in place:

not relying anymore on the jenkins lock system as the workflow is way more complex than what jenkins can handle itself.
possible to publish the previous run until the new stack started to build (but still possible even if the stack started to wait).
if a stack B is blocked on stack A because A is in manual publishing mode (packaging changes for instance), forcing the publication of A will try to republish B if it was tested against this version of A. (also, having that scaling up and cascading as needed). So less manual publication needed and push button work \o/
Some additional "force rebuild mode" which can retrigger automatically some components to rebuild if we know that it's building against a component which doesn't ensure any ABI stability (but only rebuild if that component was renewed).
ensure as well that we can't deadlock in jenkins (having hundreds of jobs running every 4h).
the dependency and order between stacks are not relying anymore on any scheduling, it's all computed and ordered properly now (thanks Jean-Baptiste to have help on the last two items).

Final words

Since the beginning of the week, in addition to seeing way more speed up for delivering work to the baseline, we also have seen the side benefit that if everybody is looking at the issues more regularly, there are less coordination work to do and each tick is less work in the end. Now, I'm counting on the ~ubuntu-unity team to keep that up and looking at the production throughout the day. Thanks guys! :)

I always keep in mind the motto “when something is hard, let’s do it more often”. I think we apply that one quite well. :)

Note

[1] what some upstreams were used to ask us