The SLO2 error budget dropped from 85% all the way down to zero in one day.
The likely cause was that the fedora-messaging consumer was stuck, so we were not getting messages from Copr about finished SRPM builds. Luckily, the babysit tasks proved to be very useful and caught those later.
We’ve added a liveness probe to the messaging consumer to prevent this from happening again.
Last updated: November 14, 2024 at 3:06 PM UTC