Azure Pipelines – From incident and extreme stress to cool down

I’m working on Azure and honestly, I cry nearly every day. I have never seen such crazy infrastructure and support.

Today, I’ll tell you, my bad experience with Azure Pipelines and what I can’t tell the client about their stupid choice.

In addition, I’ll share how I transitioned from extreme stress to cooldown: not for everyone because you need to accept and won’t come back on this choice.

1 major incident: 4 days in a week

I discovered this issue with a pipeline queued for more than 40 minutes. You understand quickly that you’ve got an incident and when you go to their Status page, you’ve nothing listed.

In the end, I tried to obtain 2 virtual machines (1 Windows and 1 Ubuntu) to install their agent on to replace the hosted agent (the one in the incident).

The answer was that we pay the hosted agent, and we continue with it.

So stupid way, it’s not working, we’ve all pipelines queued and some going to timeout due to no hosted agent available to handle.

The total time of this incident at this time is: 54 hours, 48 minutes

incident 1: 22 hours, 20 minutes
incident 2: 10 hours, 53 minutes
incident 3: 13 hours, 31 minutes
incident 4: 8 hours, 4 minutes

I was under extreme stress because I kept my mind like during my former position (a higger position than my actual one), where this style of incident may have been a real disaster.

Quick details about the former position

A periodic pipeline (every 5 minutes), acquiring a source of data (flat file), injecting the new version in the streaming solution (Kafka-like), and this data replaces some existing values (or add/remove if needed) in databases.

They’re used to evaluate risk level and other parameters using worker nodes that consume specific topics/queues and send results to databases but also trigger alerts when needed.

I let you imagine if, for more than 8 hours, you don’t update those evaluation criteria, you may put information inside the bad category and not raise an alert when it is needed.

The sub-incident

In addition, they broke the trigger…

You’re living one of the worst moments of your life with CI/CD pipelines and they broke the way that permits to trigger automatically a pipeline post-merge/push/PR…

You can’t stay relaxed, you’re fully stressed because you need to keep an eye on the git history and trigger manually some pipelines.

This incident still exists, the support ticket is still open and nothing is improving… so Microsoft.

Cooldown

I tried to obtain my 2 virtual machines but the client refused to give them to resolve this repeated incident, finally, I put my mind on cooldown.

I came back to my base about meditation coming from Tai Chi and Shaolin Kung Fu.

In addition, I accepted the reality: this client won’t improve its infrastructure so I won’t waste my energy.

Yes, it’s a dangerous cooldown because I’m becoming indifferent to this client but I’m not paid to destroy my health (physical and/or mental).

So my complete way is:

Meditation before, during (lunchtime), and after the workday
Never wasting my energy: do the job and no more, just the normal without extra hours
After the work day, go back to my certification training: COBIT and CISSP

If it’s really hard to handle, just keep in mind that this current work is just a journey, not a destination: mercenary!

I know that I’ll never use my certification/training (COBIT and CISSP) at my current job, so I’m actively monitoring my emails and LinkedIn messages.
It’s not a reason to actively ping recruiters with my CV.

Take care of your mental health, it’s very important for your physical health.

If you work at a good company, maybe they have a solution like Yogist and in that case, you’re very lucky.