← Go back to Packit Service Status

3/4 of SLO1 Error Budget consumed in 6 weeks

September 1, 2022 at 1:00 AM UTC

Resolved after 984h 0m of downtime. October 12, 2022 at 1:00 AM UTC

(Mostly for our internal use since the source data are not publicly available.)

During the 6 weeks since the beginning of September the SLO1 error budget dropped to 25%. After that (in the middle of October) the trend turned and now, at the end of October, it’s at 50%.

When looking at metrics in our (not public) Grafana (Packit boards -> (Prod/Stg) Accepted status time) we can see that the average value (of the “accepted status time”) indeed increased during the beginning of September by approx 1 second and cases of >15s started to appear so the error budget started to be consumed.

The cause is yet unknown, but the changes we did the last week of August and which could thus contribute to the problem were:

We need to continue watching the metrics, experiment with the workers (numbers and types) and give the changes more time (2 weeks at least) to be able to tell whether they change the trend.

Related:

Last updated: November 29, 2022 at 11:18 AM UTC