What’s a devContainer and what is it good for?

This is supposed to become a series of three parts, so I’m writing down the titles of the next parts here to incentivise me to write them

  1. What’s a devContainer and what is it good for?
  2. How to setup a devContainer with Visual Studio Code
  3. Remote development with devContainers

This first article is an introduction to devContainers.

What is a devContainer?

You’ve probably heard about Docker containers and how you can package an application with the operating system to make it run on any hardware.

A devContainer is exactly that but for development environments. You write code, run and debug it inside a Docker container. The devContainer has all the tools you’ll need for your development, Git, dotnet, nodejs, you name it.

What problems does it solve?

Have you ever tried to onboard a new developer to the project and spent a day trying to get the development environment to run on his/hers machine? Was it the wrong version of nodejs installed or did they miss a Windows update?

A devContainer solves this by installing the correct versions of all dependencies from the Dockerfile.

Have you developed an application using the latest technologies, .NET 5 and then a year later when you are just going to fix an issue the application no longer builds because you have .NET 6 installed on your machine and there were some breaking changes between versions?

With devContainers you will stay on .NET 5 until _you_ decide it is time to upgrade the code base. The application will not stop working because you switched machines or the tools got outdated.

Have you ever had your development environment stop working because you share database with the team and someone else ran a database migration that you haven’t got yet?

With devContainers it is easy to setup dependencies like databases in the same Docker instance so everyone on the team has their own local database without any messy installations.

What applications can be devContained?

All applications that are targets for Docker could be using devContainers for development

  • Webservices
  • API’s
  • Databases
  • Expo Apps

Applications that doesn’t work as well with devContainers are Desktop, iOS and Android applications.

What tools are required?

The definition for a devContainer is written in a file called .devcontainer/devcontainer.json. This is usually accompanied with a Dockerfile or docker-compose.yml and various setup scripts.

In order to run the devContainer on your local machine you need to have Docker Desktop 2.0+ setup.

Visual Studio Code has the best integration with devContainers as of yet, and you’ll hardly notice that you’re working inside a Docker container.

Okay, but isn’t it weird?

No, you will hardly notice that you’re working inside a Docker container.

  • A Docker container is not a virtual machine. There is almost no performance penalties of working inside a Docker container.
  • The source tree is shared between the host and the container, so you can work with your code files just as normal.
  • Git credentials are automatically forwarded to the container so you do not need any extra authentication for your devContainer.
  • When starting your application inside a container, vscode will automatically forward the port to the host so you can see the result on your machine. Just open a web browser to localhost:5000 as you usually do and it works like magic.

This was a short introduction to devContainers. Setting one up for your project is very easy and what we’re going to look at in part 2.

Refactor Your Wetware

I’m running a book club with a group of people, where we read one book every sixth months. The group is a bunch of people all working with software some way or the other. The books that we’ve read have been very management oriented but this time around we got around reading on the topic of self improvement.

Book cover page, Pragmatic Thinking & Learning - Refactor Your Wetware by Andy Hunt

This book wants you to become aware of how you think, what you think and why you think the way that you do. It also provides a couple of tools to help you think deliberately.

Andy describes a model of thinking where he split the brain into the L-mode and the R-mode, with a shared bus in between. The L-mode is the active thinking you do when you concentrate and R-mode is the background thinking you do when you shut down L-mode. The shared bus means that you can only use L-mode or R-mode, but never both at the same time. Some problems, like pattern matching is easier to do with the R-mode, but in order to engage that line of thinking you need to stop focusing. This is why you solve problems while walking the dog, taking a shower or sleeping. You turn L-mode off and let R-mode do the pattern matching needed to solve a particular problem.

It is just a model and I wouldn’t say that anyone knows if this is the way our brain works, but it does map into my own experience with taking a walk over lunch time to find new perspectives on what I’ve been working on up to that point.

The book continues to build on this model and introduce you to biases and bugs in your brain. It provides tools to be able to alter your thinking and find new ways to think and to learn.

I thought this was a useful book and I would recommend it to you if you’re interested in thinking about thinking.

My takeaways from Øredev 2022

I went to Øredev this year and found old friends from before the pandemic, new insights from a bunch of talented speakers, and a newfound fear of what has happened to the world the last three years and of what technology is threaten to make of this world.

This year’s Øredev had a Alice in Wonderland theme

An overall theme this year was IT security. I don’t know if this was intended or a side effect that many of the invited partners were in the cybersecurity space, but also the keynotes focused a lot on security. Maybe the organisers chose this path because of the instability in Europe and the Russian war in Ukraine.

Renata Salecl did a really scary session on how social media is being used by governments against us, to make us insecure in truths and facts, and to make us apathetic to who is in control and what the person in power is actually doing. Our new behaviour is that we simply don’t care anymore.

Jenny Radcliffe followed this up with talking about her experiences in breaking security protocols of companies and Emily Gorcenski showed us how technology is used to drive revolutions and how its being weaponised in the Russian war in Ukraine.

In the same vein Runa Sandvik closed the conference in telling us about all the threats happening to journalists today and how to protect them from everyone wanting to do them hard. It is not hard to draw parallels between the increasing distrust of facts, the threat situation to journalists and the dismantling of democracy.

It is a grim world that Øredev presents to us.

Melvin Kranzberg, Technology is neither good nor bad; nor is it neutral from Cennydd Bowles talk on The Ethical Engineer

Between all the security sessions there were some developer focused ones as well. One really interesting one was how to use visualisation of state machines to make really complex app logic easier to understand. I’m sure we all have seen code that is just really hard to get your head around because it is jumping between different states. David Khourshid presented a tool called xstate which not only help you simplify the code around state machines, but also allow for visualisation. Pretty neat!

Define your state machine in a simple graphical interface and then copy the code into your code editor to implement the individual steps. Really cool visualisation from David Khourshids talk, Coding Complex App Logic Visually

I went to a couple of data sessions where I found out that data application development still is several years behind backend development but there is hope. Rob Richardson showed us how to do a database devops pipeline so we actually can version changes to the database, deploy continuously and test the deploys in an isolated container.

On the same theme Kjetil Klaussen talked about how they built a data platform in 6 months to keep track of salmon “production”. Seeing the pens where they keep thousands of salmons makes me a bit nauseous, but the idea of building a team and a data platform in just 6 months is really cool.

Before I start describing every single talk that I attended I will just give you my action points that I jotted down

  • A public employee handbook like the one from Gitlab is a really cool idea and if I’ll become leader of a company with more employees than myself, that is something that I would like to try
  • I need to improve my terms and conditions with ethical considerations so I can cancel a contract when a client ask me to do something I consider unethical
  • Everyone on a developer team should be considered a volunteer (even if we pay them to be there), and we need to make sure they are happy, stimulated and appreciated
  • I need to write some proof of concept application for gRPC, as it will become standard for communicating between micro services on the backend
  • Stop having hybrid retrospectives where some are remote and others are in the same room. It puts the participants on unequal footing. Also make sure you have thinking time in retrospectives so both active thinkers and reflective thinkers may contribute
  • Flutter is new, hot and interesting technology but there is not a big enough reason for me to invest in the technology. React Native, which I already know, is more mature and I would get a higher return of learning native iOS or Android development
  • Create a personal user manual for what people need to know about me to support working together. Things like “I prefer schedule a call instead of spontaneous calls” or “I do not like being praised in public” are things to go into that personal user manual
  • I need to learn more about Web 3.0 (not Web3). Web3 will probably not go away, even if I prefer it too, but I think that the idea of owning your own data in Web 3.0 is a compelling one

Next week we”ll get access to the recordings. There are several sessions I know I should’ve prioritised instead of the ones that I chose.

Developing Solutions for Microsoft Azure

Today I passed my AZ-204: Developing Solutions for Microsoft Azure exam and became an Azure Developer Associate. I’ve done some certifications in my days, but this was by far the hardest. The breadth of the knowledge required, Azure SDKs, data storage, data connections, APIs, authentication, authorisation, compute, containers, deployment performance and monitoring – combined with the extreme details in the questions, made this really hard. I didn’t think that I passed until I got my result.

These were the kind of questions that were asked

  • Case studies: Read up on a case study and answer questions on how to solve the client’s particular problems with Azure services. Questions like, what storage technology is appropriate, what service tier should you recommend, and such.
  • Many questions about the capabilities of different services. Like, what event passing service should you use if you need guaranteed FIFO (first-in, first-out)
  • How to setup a particular scenario. Like what order you should create services in order to solve the problem at hand. Some of these questions where down to CLI commands, so make sure that you’ve dipped your toes into Azure CLI.
  • Code questions where you need to fill in the blanks on how to connect and send messages on a service bus, or provision a set of services with an ARM template. You also get code questions where you should answer questions about the result of the code.

Because of the huge area of expertise and the extreme details of the questions, I don’t think you could study and pass the exam without hands-on development experience. If I were to give advice on what to study it would be

  • Go through the Online – Free preparation material. Make sure you remember the capabilities of each service, how they differentiate, and what features higher pricing tiers enables. Those questions are guaranteed.
  • Do some exercises on connecting Azure Functions, blob storage, service bus, queue storage, event grid and event hub. These were central in the exam.
  • Make sure you know how to manage authorisation to services like blob storage and the benefits of the different ways to do it. Know your Azure KeyVault as the security questions emphasise on this.

Be prepared that it is much harder than AZ-900: Microsoft Azure Fundamentals, go slow and use up all the time that you get. Good Luck!

Product Ownership

Any pair of programmers can write some code in a garage, but once that code ships to real users you have a product, and that’s a different thing entirely.

No matter if you’re a software vendor or a packaging manufacturer building software to support your business, that software needs support, change management, hosting, integrations and documentation. “Just build it!” is often too easily said. Once it is built, you will have that software in your IT landscape for years to come.

Hiring a product owner will help you with the following things

  • Setting a vision your product should achieve
  • Drive change in the product with a team of developers
  • Collect requirements from users and stakeholders
  • Help users and stakeholders understand your product’s brilliance

Maybe you don’t need a product owner for every VBA script written in Excel, but any system with sufficient amount of users should have a product owner.

Here are some of the qualities I find important in a product owner

  • An excellent communicator to gather requirements and communicate plans
  • An ambassador that will make people interested in your product
  • Comfortable with drawing up plans and executing on them
  • A source of great values from where the team can inherit their culture
  • An internal marketer to make sure the product has continued funding

The product owner doesn’t need to be a tech wizard. Its much more important to get a good in-house marketer for your product.

Responding to Incidents

Shit happens, it is inevitable. We work so hard to keep things running, with redundancy, automatic fail-over, 99.999% availability, but most of the time outages happen because someone screwed up.

In an unhealthy organization you hang that person and move on. The organization learns nothing and is doomed to repeat the mistake.

In an healthy organization the system is at fault for allowing the person to make the mistake. The system needs to be fixed and each outage is an excellent learning opportunity.

Incident Playbook

Having a playbook of what to do in an event of an outage is basic. You need to determine what kind of outage is considered an incident, how to discover an incident and how to collect the response team. One thing most teams forget, is that the playbook is useless if

  • Nobody knows it exists, or where to find it during an incident

This is why it’s imperative to have fire drills and to practice incidents. Some go as far as actually bringing down a system, to practice a live incident.

Here’s how I would plan a fire drill

  1. Set a fixed time and date for the drill and inform the team so they can prepare
  2. Schedule a service window during the fire drill so the organization and its users can prepare
  3. Book a session with the team to present the incident playbook and make sure they know it
  4. Break the system at the start of the service window. Automatically restore the system at the end of the service window if the team has failed to find the fault
  5. Book a postmortem to evaluate the incident response

Postmortem

After an incident you should always conduct a postmortem. The point is to identify the root cause of the incident, find new systems, solutions, processes, routines to make sure the incident doesn’t reoccur.

The purpose is to create a learning organization, where you setup safe-guards for reoccurrence, which protection will remain long after the people involved in the incident are gone.

Things to consider with a postmortem

  • Putting blame on a person or a team, doesn’t prevent the incident to reoccur
  • Taking responsibility for the incident also won’t prevent it from happening again
  • The actions coming out from the retrospective meeting, must prevent the incident from happing again, or you have failed to identify the root cause

Here’s my template for postmortem retrospective to help you ask the right questions to identify the root causes.

Document your Code

I was told this week the code doesn’t need documentation because the developers are good at naming things. So I thought it was time to revisit what kind of documentation should be included in code.

Code Comments

There are 2 common objections to code comments

  1. They are not very useful because the code tells us what the program does
  2. They are often wrong because the code changes but not the comments

This is just the talk of lazy “low effort” developers. I think the agile manifesto “working software over comprehensive documentation” has done more harm than good.

Well written comments are invaluable. I’ve never come across an outdated comment that threw me off in a way that I couldn’t just delete it. 🤷‍♂️

Here are some examples of code comments I find useful

1. Adding context that is not in the code

This code was written because a behaviour in macOS.

// On macOS it's common to re-create a window in the app when the
// dock icon is clicked and there are no other windows open.
if (BrowserWindow.getAllWindows().length === 0) createWindow();

2. Adding intention to the code

There are some things that only work in this order.

// This method will be called when Electron has finished
// initialization and is ready to create browser windows.
// Some APIs can only be used after this event occurs.
app.whenReady().then(() => {
  createWindow();
});

3. Rabbit holes you went down and want to warn others of

Warning, here be dragons. 🐉

// THE OBJECT POLYFILL WILL NOT WORK ON THE WEBKIT 1.0.3 PLATFORM
// import "core-js/es/object";

4. Explaining what is going on that the code doesn’t communicate clearly

Why must public url be the same as window location?

if (publicUrl.origin !== window.location.origin) {
  // Our service worker won't work if PUBLIC_URL is on a different origin
  // from what our page is served on. This might happen if a CDN is used to
  // serve assets; see https://github.com/facebook/create-react-app/issues/2374
  return;
}

5. Add a reference to the bug or issue that prompted the change

Go check the bug description to find more information why the code looks like this.

Sentry.init({
  // BUG AB#3133 Decrease sample rate in production
  // Decreasing sample rate to keep costs down.
  tracesSampleRate: 0.1,
});

6. Description of public modules and functions

In order to get nice intellisense when using this module or function from elsewhere in the code.

/**
 * A button that let's you copy the current value to clipboard.
 *
 * @param {object} props
 * @param {string} props.text - The text to display on the button.
 * @param {string} props.value - The value to copy to clipboard.
 * @param {boolean} [props.isDisabled] - Whether the button should be disabled.
 */
function CopyButton({ text, value, isDisabled = false }) {
}

7. Source of Information

Not going to explain all this crap here. Go read up!

/**
 * The source for these abbreviations is here.
 * https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/ready/azure-best-practices/resource-abbreviations
 */
 let abbreviations = ["aks", "appcs", "ase", "plan", "appi", "apim", ....];

8. Source of Copy/Pasted Code

(we all do it sometimes)

// source: https://stackoverflow.com/a/15289883
function dateDiffInDays(a, b) {
  // Discard the time and time-zone information.
  const utc1 = Date.UTC(a.getFullYear(), a.getMonth(), a.getDate());
  const utc2 = Date.UTC(b.getFullYear(), b.getMonth(), b.getDate());

  return Math.floor((utc2 - utc1) / _MS_PER_DAY);
}

9. In order to understand this code you’ll need to know more about this special topic

We are not making up the rules, they are!

// Official DCC Schema documentation
// https://github.com/ehn-dcc-development/ehn-dcc-schema
function parseDccSchema(dcc) {
}

10. What kind of result you can expect from a module or function

/**
 * Calculator screen. It is divided into a left and right part, where the left part 
 * is the input form and the right part presents the result. If the screen width is
 * less than 768px the left part becomes top and the right part becomes the bottom.
 */
function Calculator() {
  /** implementation.. */
}

Summary

Anyone can write code that computers understand. The challenge is writing code that also humans understand.

If you want to know more about how I document code, check out the convention on my wiki.

Insights on setting up a Service Level Agreement

I have during January spent a lot of time thinking about, reading about and setting up a Service Level Agreement. The purpose is to agree on measurable metrics like uptime, responsiveness and responsibilities with your paying clients.

If it’s done right, it will influence how those clients prefer to interface to you. If they do it synchronously, asynchronously, put a cache in-between or have a failsafe.

Here I will write some general insights that I got from this process. If you want my complete SLA convention, you should check out my wiki. There I’ve also posted a sample SLA that you can reuse for your own purposes.

Always Start with Metrics

Before you dig into availability and 99.99999% you must start with metrics. What does availability mean to you? How do you measure it? What is an error? Is http status 404 an error? Does errors during maintenance count towards your metric? How is request latency measured? Is it measured on the client or the server? Do you measure the average on all the requests? How does a cold start latency affect your metric?

There are a lot of things to unpack before you can start thinking about objectives.

Should an 8 second cold start in the middle of the night affect you reaching your SLA objectives?

Not as Available as you Think

Everywhere you look businesses offer a 99,95% availability. Translated, it means 5 minutes and 2 seconds downtime weekly. A common misconception from developers is that it’s easy – All our deploys are automated anyway and if one fails, we’ll just rollback.

Before you set that objective you should consider

  • When the service goes down in the middle of the night, how much time does it take to wake somebody up to take look at the problem?
  • When the service goes down Saturday morning, do you have people working through the weekend to get the service up and running again?
  • Your availability is dependent on the availability of all the services you depend on. If you host on Azure Kubernetes which offers 99,95% availability, you cannot offer the same because Microsoft will eat up your whole failure budget.

Be kind to yourself. Don’t overpromise

  • Set an objective that promises availability within business hours, when you have developers awake that can work on the problem.
  • Pay people to be on-call when you need to offer availability off-hours.
  • Multiply availability of your dependent services with each other, and then with your own availability to reach a reasonable number. And then give yourself some slack. An objective should not be impossible or even challenging.
Azure Kubernetes = 99.95%
Azure MySQL = 99.9%
Azure API Management = 99.95%
My availability = 99%

Total Availability = 99.95% * 99.9% * 99.95% * 99% = 98.8%

Every Metric must be Measured

This sound so obvious, how can you know that you meet the objective unless you measure the metric? Still I rarely see anyone measuring their service level indicators. Maybe they don’t want to know.

If you are using a cloud provider like Microsoft Azure, you can setup workbooks to measure your metrics. I’m a proponent of giving my clients access to these workbooks so they can see that we live up to the SLA.

A dashboard that is automatically updated with the metrics from our service license agreement.

The Client Also have Responsibilities

An agreement goes both ways, and in order for you as a vendor to fulfil your part of the agreement you need to put some requirements on the client.

  • Define a reasonable workload that the client is allowed to put on your service for the objectives to be obtainable. You can set a limit of 100 requests/second and refuse excess requests. Those errors do not count towards your error budget.
  • The client should be responsible for adjusting their service clients to updates in your API. You don’t want to maintain a 5 year old version of your system.

Reparations should Repair not Bankrupt

I’ve seen so many service license agreements that include a fine if the objectives are not met, and often those fines are quite high. They seldom define how often a client can request a payout, and together with badly defined objectives, a client could drive a service provider into bankruptcy.

That is not beneficial to anyone, so please stop writing SLAs with harsh penalties. You should try to repair and not bankrupt

  • How much damage was caused by the outage?
  • Can we update the service level objectives to become more reasonable?
  • Can the client adjust their use of our service to better fit our new objectives?
  • Is the client open to paying more so we can have a service technician on-call?

Summary

Writing an SLA is hard. It requires experience from both the legal team and IT operations. Availability is not an objective that a client can demand of your service. It must be negotiated and carefully weighed between IT operations environment, support organization and costs.

Taking Control of Azure Access Control

This is another post in the unintended series about untangling your Azure account. My first post was about naming and grouping your Azure Resources. The second was about writing conventions and following them. This third post is about managing Azure Access Control.

Developers, Developers, Developers

There’s nothing inherently wrong with handing out developer access to each resource and resource group they need. You will have a mess of access rights spread all over, but you will easily revoke access by removing developers from the subscription.

If you need to manage privileges in a structured way, it is less than ideal. That is why I have developed a convention for managing access in our Azure subscription. It’s quite easy.

For each resource group, create one user group with contributor role, and use the following name format.

The name format of a user group.

Let’s break it down

  • Project Name and Component Name should be exactly the same as the resource group name.
  • Environment, I usually go with dev and prod. I have never come across a situation where I needed to hand out access specifically to test or stage. So dev means dev & test where prod means stage & prod.
  • Contributor is useful to have if you need to hand out access for more roles later. For me, the most common access role after contributor has been monitoring.
  • UG is the user group suffix, which helps you deal with these in Azure API scenarios.

There will be one user group for every resource group.

Assigning Access

You can now assign access to the user groups instead of the Azure resources directly.

Assigning user access to user groups instead of direct access to resource groups.

Managing access will become much easier.

Groups of Groups

Doing this will unlock the potential of combining user groups into larger user groups. If the project “Klabbet” has both a web and api component, we can create a user group that will give developers access to both.

User GroupMember OfComment
klabbet-dev-contributor-ugklabbet-web-dev-contributor-ug
klabbet-api-dev-contributor-ug
API and Web dev access.
klabbet-prod-contributor-ugklabbet-web-prod-contributor-ug
klabbet-api-prod-contributor-ug
API and Web prod access.
We can combine user groups into more permissive user groups.

By combining user groups into larger user groups we will get better control of what kind of access a user has, without investing too much effort.

One user group assignment will give the user access to 4 resource groups.

Summary

I’ve presented you a format for access control that does not require much effort to setup, but provides lots of flexibility to take control of your access control.

If you’re interested in my convention for access control you can find the specification here.

Project Conventions are Crucial

In my previous blog post I wrote about the importance of having a convention for grouping and naming resources in Azure. In this article I will explain how to setup conventions for your project. Writing a convention is easy. Making sure it is followed is much harder.

Purpose of Conventions

We write conventions for our projects of the following two reasons

  1. If everyone do things their own way we’ll have a mess. Messes are hard to maintain.
  2. Jotting it down saves a lot of time when we introduce new developers to the team.

Don’t write a lot of conventions up front, but also don’t wait until the absence of conventions turn into a mess.

The Art of Writing Conventions

Writing conventions is the easy part. Here is a sample of how I do it.

My convention or naming resources on Azure.

Don’t feel obliged to add all the bells and whistles if you don’t need them. Here is what I do

  • Versioning, makes governance much easier. A simple document version table at the top will do.
  • Each statement is short and precise. Don’t give room for interpretations.
  • I use RFC2119 to make it clear what’s a rule (MUST), an injunction (SHOULD) or a suggestion (MAY).
  • One statement per line makes it easier to skim through.
  • Numbering each statement will make it easier to reference individual statements. Anchor links are nice.
  • Link to external resources for further reading.
  • You don’t need to justify a statement. Leave it for team discussion.

Keeping it Alive

Conventions that aren’t followed are useless.

Include project conventions in your peer review process.

Take 15 minutes each week (after daily on Fridays) with the team to discuss each convention. Update the convention during the meeting. This will make the whole team aware of the convention and the conventions will be kept updated. If you have 26 conventions you will go through them all every 6 months.

I find these sessions very valuable for the team, and if you replace people often in your team they are a necessity.

Moving on From Here

I’ve posted my conventions from Bring Order to your Azure Account publicly with Creative Commons license. Go ahead and steal those conventions for your own project wiki and update them to fit your team.