My AZ-305 Designing Microsoft Azure Infrastructure Solutions Study Path

I’ve been talking for years about getting the Solution Architect credential, but I’ve never put aside the amount of time needed. This latter half of this year I’ve decided to take 20% of the time I usually spend on clients and spend it on myself instead, and the first goal was to take the AZ-305 exam.

Note: I cannot say anything about the exam itself, as you’re made to sign an NDA not to, but I can tell you about my study path and how I first failed, and then succeeded.

First Try

I failed my first try at this exam, and from what I’ve gathered, it’s not uncommon. I spent about 36 hours of study time in the first round, and I focused on the study path that Microsoft supply on their certificate page.

This study path does not represent the knowledge you’re being tested on. I failed because I studied the wrong things. I got 634 points out of 1000 where 700 is the passing limit.

After failing I did a short retrospective with myself on what went wrong, found new resources to study and set at it again for another 3 weeks of intensive studying. I can be quite stubborn when my mind is set on something.

Second Try

I spent about 40 hours on my second round of studies. First of all I bought the MeasureUp AZ-305 Practice Test and I did all of the 168 questions in 4 sittings. The way I did it was that for every question, I pasted it into Chat GPT and then we discussed every possible answer, why it was right or wrong. This way I used the test to find my knowledge gaps. It was also a great way to discover and remember the things I got wrong, instead of just skipping to the next question. It helped me to get a better understanding about topics I’m not familiar with.

The practice questions can be questionable, but the act of going through and discussing them was most useful to me.

This was a great use case for AI, even if Chat GPT wasn’t always right, it helped me remember as I had to reason about the knowledge. I find that much better than just reading.

I should say, the MeasureUp test has questions that are close to the real exam, but some of the questions are infuriating, and I did find some that were plain wrong. While this sounds bad, getting angry is also a good way of remembering what you try to study.

After identifying my knowledge gaps I did a couple of labs in Azure. I setup scenarios in my own Azure tenant, created resources and tried different things. This was very useful for resources and features that I don’t use myself in my day-to-day work.

  • Availability sets, creating virtual machines in sets, setting up Azure Load Balancer and testing fail-over
  • Availability zones, creating virtual machines in different zones
  • Virtual machine scale sets, setting up an autoscaling cluster of machines
  • Azure Site Recovery, setting up replication of a machine in a different region
  • Azure Backup, playing around with the different backup options
  • Azure SQL where I setup different configurations of single Azure SQL, DTU tier, vCore Tier, Elastic Pool and Managed Instance
  • Azure Policy and Initiatives, creating policies and applying them to my subscriptions

I wanted to play around more with Microsoft Entra ID, but most of the things I wanted to lab with requires a P2 license, like conditional access, access reviews, PIM and ID Protection.

Another thing I did was I watched John Savill’s study cram on YouTube. While it’s very high level and not detailed enough to pass the exam, I found that sometimes he was saying things I didn’t know about, so I went ahead and looked it up to learn about it. I watched this during my commute over a span of 3 weeks.

John Savill is the GOAT for making these study cram videos. I think it was good repetition of the basics before the exam.

The last thing I did was that I got the AZ-305 Exam ref from Amazon. First I thought it was a waste of money, because it would be delivered before the day of my exam, but it arrived early and I spent a couple of evenings reading it through.

While it doesn’t contain all the details you need to know, it’s still a very good and dense walkthrough of everything on a high level, and sometimes very detailed as well. I can recommend getting it if you’re struggling with the exam.

The exam ref has all the bullet points of what you need to know. Maybe not all the details, but it’ it’s a good starting point.

With all this studying I was much more confident on my second try and I finished with 844 points out of 1000 where 700 is the passing score.

Summary

I think this certificate was quite hard, the hardest yet. The reason for me saying so, is that in my previous certificates Administrator and Developer I’ve felt quite at home by using the technology in my daily job. In this certificate they test that you know much about all of Azure, not only the parts that you are comfortable with.

It took me about 80 hours of effective study time to learn everything I needed and I don’t think it’s something that anyone would pass without study. Everyone has their part of Azure they’re comfortable with, and this tests on the whole platform.

Now I have the Administrator, the Developer and the Solution Architect certifications. The only one left that I’m interested in is the DevOps certificate so I guess I’ll do that next.

App Service Plan Random Restarts

I’m hosting a real-time system that is very dependent on low latency throughput and I’m doing it on Azure. In hindsight this might not have been the best choice as you have no control over the PaaS services and only a shallow insight over the IaaS service that Azure offers. In hindsight, when you’re writing a real-time system, deploy it on an environment where you control everything.

Last week we were starting to get problems that the system would have these interruptions. Randomly it looked like the system would stop working for 1-2 minutes and then be back to normal. First we thought it was the network, but after diagnosis of the whole system, we found that the App Service Plan was restarting and this was causing the interruptions.

The memory graph shows when an instance drops, a new one is booting up.

There is no log of this, but you can see it if you watch the App Service Plan metrics, and split the Memory Percentage on instance. You can see that new instances starts up when old ones are killed. While the new instance is starting up, we drop connections and the real-time system stops working for 1-2 minutes.

In a normal system this wouldn’t be a problem, because all requests would move over to the instance that is being live, and the users wouldn’t be affected, but we’re working with web sockets and they cannot be load balanced like that. Once they’re established, they will need to be reconnected if the instance goes down.

So this was bad for us!

These kind of issues are hard to troubleshoot because Azure App Service Plan is PaaS. You don’t have access to all the logs needed, but I found this tool when you go into the Azure App Service and select Resource Health / Diagnose and solve problems and search for Web App Restarted.

There a lots of diagnose tools for Azure App Service if you know where to find them. This one shows web app restarts.

This confirms the issue but really doesn’t tell us why the instances are restarting. Asking Chat GPT for common reasons for App Service Instance restarts, I got the following list

  • App Crashes
  • Out of Memory
  • Application Initialization Failures
  • Scaling or App Service Plan Configuration
  • Health Check Failures
  • App Service Restarts (Scheduled or Manual)
  • Underlying Infrastructure Maintenance (by Azure)

The one that stood out to me was “Health Check Failures” so I went into the Health Check feature on my App Service and used “Troubleshoot” but it said everything was fine. So I checked the requests to my /health endpoint and it told a different story.

The health check is failing a couple of times per day and this seems to be the cause of the App Service instance restarts.

The health checks are fine 99.99% of the times, but those 0.01% flukes will cause the instance to be restarted. Azure App Service will consider that the instance is unhealthy and restart it.

To test my theory I turned off health checks on my Azure App Service, and the problem went away. After evaluating for 24 hours we had zero App Service Instance restarts.

When I turned off health checks on Azure App Service, to test my theory, the problems with the restarts disappeared.

The problem is confirmed, but why are health checks failing? Digging a little deeper I found the following error message

Result: Health check failed: Key Vault
Exception: System.Threading.Tasks.TaskCanceledException: A task was canceled.

In my health checks I check that the service has all the dependencies it needs to work. It cannot be healthy if Azure Key Vault is inaccessible. In this case Azure Key Vault would return an error 4 times during 24 hours, and this would cause the health check to fail and the instances to be rebooted.

Why would it fail? This is could be anything. Maybe Microsoft was making updates to Azure Key Vault. Maybe there was a short interruption to the network. It doesn’t really matter. What matters is that this check should not restart the App Service instances, because the restart is a bigger problem than Key Vault failing 4 checks out of 2880.

Liveness and Readiness

Health checks are a good thing. I wouldn’t want to run the service without them, but we cannot have them restarting the service every hour. So we need to fix this.

I know of the concept of liveness and readiness from working with Kubernetes. I don’t know if this is a Kubernetes thing, but that is where I learned the concept.

  • Liveness means that the service is up. It has started and are responding to essentially ping.
  • Readiness means that the service is ready to receive traffic

What we could do, is to split health checks into liveness checks and readiness checks. Liveness checks would just return 200 OK so that Azure App Service health checks have an endpoint for evaluating the service.

The readiness checks would do what my health checks do today, verify that the service has all the dependences required for it to work. I would connect my Availability Checks to the readiness so I get a monitor alarm if the service is not ready.

The health checks are using the new liveness endpoint that doesn’t verify the dependencies.
The availability check use the new ready endpoint to verify that all dependencies are up and running.

Developing Solutions for Microsoft Azure

Today I passed my AZ-204: Developing Solutions for Microsoft Azure exam and became an Azure Developer Associate. I’ve done some certifications in my days, but this was by far the hardest. The breadth of the knowledge required, Azure SDKs, data storage, data connections, APIs, authentication, authorisation, compute, containers, deployment performance and monitoring – combined with the extreme details in the questions, made this really hard. I didn’t think that I passed until I got my result.

These were the kind of questions that were asked

  • Case studies: Read up on a case study and answer questions on how to solve the client’s particular problems with Azure services. Questions like, what storage technology is appropriate, what service tier should you recommend, and such.
  • Many questions about the capabilities of different services. Like, what event passing service should you use if you need guaranteed FIFO (first-in, first-out)
  • How to setup a particular scenario. Like what order you should create services in order to solve the problem at hand. Some of these questions where down to CLI commands, so make sure that you’ve dipped your toes into Azure CLI.
  • Code questions where you need to fill in the blanks on how to connect and send messages on a service bus, or provision a set of services with an ARM template. You also get code questions where you should answer questions about the result of the code.

Because of the huge area of expertise and the extreme details of the questions, I don’t think you could study and pass the exam without hands-on development experience. If I were to give advice on what to study it would be

  • Go through the Online – Free preparation material. Make sure you remember the capabilities of each service, how they differentiate, and what features higher pricing tiers enables. Those questions are guaranteed.
  • Do some exercises on connecting Azure Functions, blob storage, service bus, queue storage, event grid and event hub. These were central in the exam.
  • Make sure you know how to manage authorisation to services like blob storage and the benefits of the different ways to do it. Know your Azure KeyVault as the security questions emphasise on this.

Be prepared that it is much harder than AZ-900: Microsoft Azure Fundamentals, go slow and use up all the time that you get. Good Luck!