Azure Pipelines - From incident and extreme stress to cooldown
I’m actually working on Azure and honestly, I cry near each day. I never see a so crazy infrastructure and support.
Today, I’ll tell you, my bad experience with Azure Pipelines and what I can’t tell to the client about their stupid choice.
In addition, I’ll share how I did to transition from extreme stress to cooldown: not for everyone because you need to accept and won’t come back on this choice.
1 major incident: 4 days in a week
I discovered this issue with a pipeline queued for more than 40 minutes. You understand quickly that you’ve an incident and when you go to their Status page, you’ve nothing listed.
At the end, I tried to push to obtain 2 virtual machines (1 Windows and 1 Ubuntu) to install their agent on it to replace the hosted agent (the one in incident).
The answer was: we pay the hosted agent, we continue with it.
So stupid way, it’s not working, we’ve all pipelines queued and some going to timeout due to no hosted agent available to handle.
The total time of this incident at this time is: 54 hours, 48 minutes
- incident 1: 22 hours, 20 minutes
- incident 2: 10 hours, 53 minutes
- incident 3: 13 hours, 31 minutes
- incident 4: 8 hours, 4 minutes
I was with extreme stress because I keep my mind like during my former position (a lot bigger position than my actual), where this style of incident may be a real disaster.
Quick details about the former position
A periodic pipeline (every 5 minutes), acquiring a source of data (flat file), injecting the new version in the streaming solution (Kafka-like) and this data replace some existing values (or add/remove if needed) in databases.
They’re used to evalute risk level and other parameters using worker nodes that consume specific topics/queues and send results to databases but also alerting system when needed.
I let you imagine, if during more than 8 hours, you don’t update those evoluation criterias, you may put information inside the bad category and don’t raise an alert when it was needed.
The sub-incident
In additon, they broke the trigger…
You’re living one of the worst moment of your life about CI/CD pipelines and they broke the way that permit to trigger automatically a pipeline post merge/push/PR…
You can’t stay relax, you’re full stressed because you need to keep an eye on the git history and trigger manually some pipelines.
This incident is still existing, support ticket still open and nothing is improving… so Microsoft.
Cooldown
I tried to obtain my 2 virtual machines but the client refused to give them to resolve this repeatly incident, finally, I put my mind in cooldown.
I came back to my base about meditation coming from Tai Chi and Shaolin Kung Fu.
In addition, I accepted the reality: this client won’t improve its infrastructure so I won’t waste my energy.
Yes, it’s a dangerous cooldown because I’m becoming indifferent to this client but I’m not paid to destroy my health (physical and/or mental).
So my complete way is:
- Meditation before, during (lunch time) and after the workday
- Never wasting my energy: do the job and no more, just the normal without extra hours
- After the work day, go back to my certification training: COBIT and CISSP
If it’s really hard to handle, just keeping in mind that this current work is just a journey, not a destination: mercenary!
I know that I’ll never use my certification/training (COBIT and CISSP) at my current job, so I’m actively monitoring my emails and LinkedIn messages. Actually, It’s not a reason to actively ping recruiters with my CV.
Take care of your mental health, it’s very important for your physical health.
If you work at good company, maybe they’ve a solution like Yogist and in that case, you’re very lucky.