Today I passed my AZ-204: Developing Solutions for Microsoft Azure exam and became an Azure Developer Associate. I’ve done some certifications in my days, but this was by far the hardest. The breadth of the knowledge required, Azure SDKs, data storage, data connections, APIs, authentication, authorisation, compute, containers, deployment performance and monitoring – combined with the extreme details in the questions, made this really hard. I didn’t think that I passed until I got my result.
These were the kind of questions that were asked
Case studies: Read up on a case study and answer questions on how to solve the client’s particular problems with Azure services. Questions like, what storage technology is appropriate, what service tier should you recommend, and such.
Many questions about the capabilities of different services. Like, what event passing service should you use if you need guaranteed FIFO (first-in, first-out)
How to setup a particular scenario. Like what order you should create services in order to solve the problem at hand. Some of these questions where down to CLI commands, so make sure that you’ve dipped your toes into Azure CLI.
Code questions where you need to fill in the blanks on how to connect and send messages on a service bus, or provision a set of services with an ARM template. You also get code questions where you should answer questions about the result of the code.
Because of the huge area of expertise and the extreme details of the questions, I don’t think you could study and pass the exam without hands-on development experience. If I were to give advice on what to study it would be
Go through the Online – Free preparation material. Make sure you remember the capabilities of each service, how they differentiate, and what features higher pricing tiers enables. Those questions are guaranteed.
Do some exercises on connecting Azure Functions, blob storage, service bus, queue storage, event grid and event hub. These were central in the exam.
Make sure you know how to manage authorisation to services like blob storage and the benefits of the different ways to do it. Know your Azure KeyVault as the security questions emphasise on this.
I have during January spent a lot of time thinking about, reading about and setting up a Service Level Agreement. The purpose is to agree on measurable metrics like uptime, responsiveness and responsibilities with your paying clients.
If it’s done right, it will influence how those clients prefer to interface to you. If they do it synchronously, asynchronously, put a cache in-between or have a failsafe.
Here I will write some general insights that I got from this process. If you want my complete SLA convention, you should check out my wiki. There I’ve also posted a sample SLA that you can reuse for your own purposes.
Always Start with Metrics
Before you dig into availability and 99.99999% you must start with metrics. What does availability mean to you? How do you measure it? What is an error? Is http status 404 an error? Does errors during maintenance count towards your metric? How is request latency measured? Is it measured on the client or the server? Do you measure the average on all the requests? How does a cold start latency affect your metric?
There are a lot of things to unpack before you can start thinking about objectives.
Not as Available as you Think
Everywhere you look businesses offer a 99,95% availability. Translated, it means 5 minutes and 2 seconds downtime weekly. A common misconception from developers is that it’s easy – All our deploys are automated anyway and if one fails, we’ll just rollback.
Before you set that objective you should consider
When the service goes down in the middle of the night, how much time does it take to wake somebody up to take look at the problem?
When the service goes down Saturday morning, do you have people working through the weekend to get the service up and running again?
Set an objective that promises availability within business hours, when you have developers awake that can work on the problem.
Pay people to be on-call when you need to offer availability off-hours.
Multiply availability of your dependent services with each other, and then with your own availability to reach a reasonable number. And then give yourself some slack. An objective should not be impossible or even challenging.
Azure Kubernetes = 99.95%
Azure MySQL = 99.9%
Azure API Management = 99.95%
My availability = 99%
Total Availability = 99.95% * 99.9% * 99.95% * 99% = 98.8%
Every Metric must be Measured
This sound so obvious, how can you know that you meet the objective unless you measure the metric? Still I rarely see anyone measuring their service level indicators. Maybe they don’t want to know.
If you are using a cloud provider like Microsoft Azure, you can setup workbooks to measure your metrics. I’m a proponent of giving my clients access to these workbooks so they can see that we live up to the SLA.
The Client Also have Responsibilities
An agreement goes both ways, and in order for you as a vendor to fulfil your part of the agreement you need to put some requirements on the client.
Define a reasonable workload that the client is allowed to put on your service for the objectives to be obtainable. You can set a limit of 100 requests/second and refuse excess requests. Those errors do not count towards your error budget.
The client should be responsible for adjusting their service clients to updates in your API. You don’t want to maintain a 5 year old version of your system.
Reparations should Repair not Bankrupt
I’ve seen so many service license agreements that include a fine if the objectives are not met, and often those fines are quite high. They seldom define how often a client can request a payout, and together with badly defined objectives, a client could drive a service provider into bankruptcy.
That is not beneficial to anyone, so please stop writing SLAs with harsh penalties. You should try to repair and not bankrupt
How much damage was caused by the outage?
Can we update the service level objectives to become more reasonable?
Can the client adjust their use of our service to better fit our new objectives?
Is the client open to paying more so we can have a service technician on-call?
Writing an SLA is hard. It requires experience from both the legal team and IT operations. Availability is not an objective that a client can demand of your service. It must be negotiated and carefully weighed between IT operations environment, support organization and costs.
There’s nothing inherently wrong with handing out developer access to each resource and resource group they need. You will have a mess of access rights spread all over, but you will easily revoke access by removing developers from the subscription.
If you need to manage privileges in a structured way, it is less than ideal. That is why I have developed a convention for managing access in our Azure subscription. It’s quite easy.
For each resource group, create one user group with contributor role, and use the following name format.
Environment, I usually go with dev and prod. I have never come across a situation where I needed to hand out access specifically to test or stage. So dev means dev & test where prod means stage & prod.
Contributor is useful to have if you need to hand out access for more roles later. For me, the most common access role after contributor has been monitoring.
UG is the user group suffix, which helps you deal with these in Azure API scenarios.
There will be one user group for every resource group.
You can now assign access to the user groups instead of the Azure resources directly.
Managing access will become much easier.
Groups of Groups
Doing this will unlock the potential of combining user groups into larger user groups. If the project “Klabbet” has both a web and api component, we can create a user group that will give developers access to both.
I’ve had the benefit of stepping in to other developers’ Azure accounts, and it’s very much like opening up someones brain and taking a peek inside.
There are some bits ordered here and there, but it’s mostly chaos.
If it’s setup by someone with experience you will notice an intention of following conventions because they’ve seen first hand how quickly it gets out of hand. When there are more than one contributor, you will start noticing different patterns depending on which person set it up.
Most Azure resources are hard to move and rename. If you do something wrong you will need to delete and recreate the resource which often isn’t worth the trouble.
Here follows my convention. If you don’t like it, feel free to create your own. The most important thing is not how resources are named, but that the convention is followed so they’re named the same way.
I want you to think about the subscription as “who gets the invoice?”. Not which department is going to bear the cost, but the actual invoice. Depending on your organization structure you need 1 subscription, or you’ll need 10.
It is not uncommon to have one financial department that will receive the invoice and split the costs on different cost units. It is also not uncommon to have 4 different legal entities and invoicing them separately.
If you as a consultant are doing work for a client that doesn’t have their own subscription, create a subscription for them. You will have all the costs neatly gathered, and you can easily hand it over when the project is done.
I structure my resource groups like this.
Let’s unpack this a bit
Project Name, is the name of the application, project, website. I place this first, because when I’m searching for a resource group I will always start looking for the name of the application.
Component Name, is optional. Sometimes you don’t need it, but most of the time there are several components in a project. A web, an API, a BI component.
Environment, will be dev, test, stage or prod. I place different environments in different resource groups to make it easier to tear down one environment and rebuild another. It helps with keeping environments isolated and access control on the environment level.
Rg, to make it easier to recognise a resource group in az cli and other API scenarios.
A note on isolating environments. It drive cost up when you won’t share app service plans between dev and test. In my experience dev/test resources are cheap and it doesn’t cost much to duplicate them. The benefit of completely isolated environments is greater than the cost of split resources.
In my small Klabbet subscription it looks like this.
This scales well with hundreds of resource groups thanks to the filter functionality. Writing vaccinated in the filter box, will help me to quickly identify all resource groups belonging to that project.
Now we’ve made sure each resource group only contains resources for a specific application, component, environment. It will make the number of resources in each group much fewer and easier to overview.
I have a naming convention for azure resources as well. (looks very much like the previous one)
Project name, component name and environment in the resource name is useful for working with az cli and Azure APIs. The resource name abbreviation should be picked for the Microsoft recommended abbreviations. If your resource doesn’t have an official abbreviation, make your own and document it in your convention.