{"id":241,"date":"2022-02-12T20:30:00","date_gmt":"2022-02-13T01:30:00","guid":{"rendered":"http:\/\/sycured.127.0.0.1.sslip.io\/?p=241"},"modified":"2024-01-14T11:45:49","modified_gmt":"2024-01-14T16:45:49","slug":"azure-pipelines-incident-extreme-stress-cooldown","status":"publish","type":"post","link":"http:\/\/10.42.0.68:8080\/blog\/azure-pipelines-incident-extreme-stress-cooldown","title":{"rendered":"Azure Pipelines – From incident and extreme stress to cool down"},"content":{"rendered":"\n
I’m working on Azure and honestly, I cry nearly every day. I have never seen such crazy infrastructure and support.<\/p>\n\n\n\n
Today, I’ll tell you, my bad experience with Azure Pipelines and what I can’t tell the client about their stupid choice.<\/p>\n\n\n\n
In addition, I’ll share how I transitioned from extreme stress to cooldown: not for everyone because you need to accept and won’t come back on this choice.<\/p>\n\n\n\n\n\n\n\n
I discovered this issue with a pipeline queued<\/strong> for more than 40 minutes<\/strong>. You understand quickly that you’ve got an incident and when you go to their Status page<\/a>, you’ve nothing listed.<\/p>\n\n\n\n In the end, I tried to obtain 2 virtual machines (1 Windows and 1 Ubuntu) to install their agent on to replace the hosted agent (the one in the incident).<\/p>\n\n\n\n The answer was that we pay the hosted agent, and we continue with it.<\/em><\/p>\n\n\n\n So stupid way, it’s not working, we’ve all pipelines queued and some going to timeout due to no hosted agent available to handle.<\/p>\n\n\n\n The total time of this incident at this time is: 54 hours, 48 minutes<\/p>\n\n\n\n I was under extreme stress because I kept my mind like during my former position (a higger position than my actual one), where this style of incident may have been a real disaster.<\/p>\n\n\n\n A periodic pipeline (every 5 minutes), acquiring a source of data (flat file), injecting the new version in the streaming solution (Kafka-like), and this data replaces some existing values (or add\/remove if needed) in databases.<\/p>\n\n\n\n They’re used to evaluate risk level and other parameters using worker nodes that consume specific topics\/queues and send results to databases but also trigger alerts when needed.<\/p>\n\n\n\n I let you imagine if, for more than 8 hours, you don’t update those evaluation criteria, you may put information inside the bad category and not raise an alert when it is needed.<\/p>\n\n\n\n In addition, they broke the trigger\u2026<\/p>\n\n\n\n You’re living one of the worst moments of your life with CI\/CD pipelines and they broke the way that permits to trigger automatically a pipeline post-merge\/push\/PR\u2026<\/p>\n\n\n\n You can’t stay relaxed, you’re fully stressed because you need to keep an eye on the git history and trigger manually some pipelines.<\/p>\n\n\n\n This incident still exists, the support ticket is still open and nothing is improving\u2026 so Microsoft.<\/p>\n\n\n\n I tried to obtain my 2 virtual machines but the client refused to give them to resolve this repeated incident, finally, I put my mind on cooldown.<\/p>\n\n\n\n\n
Quick details about the former position<\/h3>\n\n\n\n
The sub-incident<\/h2>\n\n\n\n
Cooldown<\/h2>\n\n\n\n