Experiencing disruptions

Staging is unstable →

Since the last cluster update, we’ve been running our staging instance on a separate node pool which uses AWS Spot instances. It causes the staging instance to be quite unreliable. We’re testing it in order to find balance between price and usefulness.


Core   (?) Systems we have complete control over
API   (?) prod.packit.dev/api - If API is down, we're not able to receive your requests. Operational
Workers   (?) If workers are down, received requests are waiting in a queue to be proceeded. Operational
Dashboard   (?) dashboard.packit.dev Operational
Others   (?) Systems we use, but have no direct control over
Copr   (?) copr.fedorainfracloud.org - If it's down, we can't build your packages. Operational
Testing Farm   (?) api.dev.testing-farm.io - If it's down, we can't run tests. Operational
Koji   (?) koji.fedoraproject.org Operational
Incidents UptimeRobot Testing Farm Status Packit docs Packit dashboard

2022 (18)

November 28, 2022 at 4:00 PM UTC

Upgrade of Copr Servers

Resolved after 5h 0m of downtime
November 25, 2022 at 8:35 AM UTC

Staging is unstable

▲ This issue is not resolved yet
November 24, 2022 at 11:24 PM UTC

Outage

Resolved after 9h 11m of downtime
November 24, 2022 at 7:00 PM UTC

System upgrade scheduled

Resolved after 4h 0m of downtime
November 1, 2022 at 3:00 AM UTC

Copr builds and tests on commits and releases ignored

Resolved after 31h 30m of downtime
October 28, 2022 at 9:00 PM UTC

Copr storage move

Resolved after 35h 0m of downtime
October 13, 2022 at 7:00 PM UTC

System upgrade scheduled

Resolved after 1h 37m of downtime
October 11, 2022 at 1:57 PM UTC

Issues with the task execution

Resolved after 1h 15m of downtime
October 10, 2022 at 3:14 PM UTC

Internal Testing Farm infrastructure is having an outage

Resolved after 30h 20m of downtime
October 5, 2022 at 4:00 PM UTC

GitHub Webhooks not working

Resolved after 2h 0m of downtime
September 22, 2022 at 8:00 AM UTC

September 22th outage

Resolved after 24h 0m of downtime
September 1, 2022 at 1:00 AM UTC

3/4 of SLO1 Error Budget consumed in 6 weeks

Resolved after 984h 0m of downtime
August 23, 2022 at 4:00 AM UTC

Failing SRPM builds done in Copr

Resolved after 5h 0m of downtime
July 4, 2022 at 8:00 AM UTC

Summer is here ☀️

Resolved after 106h 0m of downtime
June 22, 2022 at 2:00 PM UTC

Upgrade of Copr Servers

Resolved after 2h 10m of downtime
June 21, 2022 at 10:18 AM UTC

SRPM builds done in Copr are failing

Resolved after 52m of downtime
February 3, 2022 at 8:37 AM UTC

Networking Issue

Resolved after 2h 40m of downtime
February 1, 2022 at 2:00 AM UTC

February 1st outage

Resolved after 2h 15m of downtime

2021 (10)

December 8, 2021 at 2:00 AM UTC

New production deployment

Resolved after 574h 0m of downtime
December 7, 2021 at 9:06 AM UTC

Longer response times

Resolved after 2h 43m of downtime
November 18, 2021 at 2:33 AM UTC

Problems with production database

Resolved after 21h 37m of downtime
November 18, 2021 at 1:00 AM UTC

Moving production Packit Service to a new cluster

Resolved after 1h 33m of downtime
September 7, 2021 at 8:30 PM UTC

Slow SRPM builds

Resolved after 211h 24m of downtime
August 27, 2021 at 3:00 PM UTC

August 27th outage

Resolved after 19h 0m of downtime
July 1, 2021 at 10:55 AM UTC

Summer is here ☀️ #2

Resolved after 130h 15m of downtime
June 25, 2021 at 10:55 AM UTC

Summer is here ☀️

Resolved after 82h 15m of downtime
June 4, 2021 at 7:00 AM UTC

PostgreSQL db upgrade

Resolved after 2h 10m of downtime
June 2, 2021 at 11:16 AM UTC

Dashboard - unable to connect

Resolved after 24m of downtime