This was originally written as part of one of my weeknotes (a way of working in the open where you document your week), but it’s not really a weeknote and more of a set of thinking about how to do support well and what it means at a human level to deliver on a Service Level Agreement (SLA).

At Delib, we’re a fairly small team (about 30ish people across the whole company) and a smaller subset of us are responsible for delivering for and looking after our customers. Of that smaller group, we rotate through each of us spending time on support and on-call. Each account manager spends a week on support and on-call, accompanied by one of our engineers who rotate through support stints on a fortnightly basis. This means, at any time, you’ll have two very knowledgeable people whose job it is to answer incoming support queries and/or respond to critical issues.

On support means answering incoming queries via Zendesk: the software we use to manage support tickets (or questions, as they’d be called by a normal person) and which also provides the online space for our knowledge base of how-to articles to support the use of our products. It also means taking incoming support questions which come in via phone. It’s very reactive and it can be difficult to do other work.

On-call means being the person who will first be alerted via text, email and push-notification if our site monitoring goes off and who gets straight on with investigating and notifying our customer. This monitoring is on all customer sites and all our Delib-owned sites and we currently use a combination of Pingdom for the site monitoring and VictorOps to manage follow-the-sun on-call scheduling across the global team. In the UK, on-call runs from 06:00 to 19:00 Monday to Thursday and is 24 hours over the weekend; we currently rotate this around five of us in account management, taking a week each. For our colleagues in Australia and New Zealand this varies depending on the time of year (there are a lot more timezones to worry about!) but normally falls sometime between 6am – 7pm Monday to Friday local time.

In an on-call week you might not have any alerts, though it still has an impact on your life. For example, you might not do the things you want to do if that would mean you’d be away from your phone for a while during on-call times. If it’s 24-hour weekend cover then you might be woken in the night so you might want to find a way not to have a notification wake up anyone else you live with. The thing about being on-call is that you are kind of ‘on’ regardless if anything happens or not.

With that in mind, this post is going to talk about how delivering on a Service Level Agreement has a human cost that I think may be invisible to most people. I’m also going to cover what I believe good support looks like and how we do it at Delib.

Service Level Agreements and the human cost of on-call

A Service Level Agreement (SLA) sets out in writing what you promise your service or product will deliver for the other party, normally this has some metrics around response times and dealing with critical and non-critical issues. SLAs are pretty cold, contractual things, but the real world delivery of them is very human.

For example, I wonder if most people know that when a line in a contract says “Delib will perform a fault diagnostic on any and all Critical Errors as soon as possible — 24 hours per day, seven days per week, within a maximum response time of two hours” what this means in practice is that a person (or persons) might get woken by a notification at 3am on a weekend saying a site is offline, so they shuffle into the bathroom with their work bag to sit on the side of a cold bath and check if anything is wrong, to find the site is online and running fine and that they can’t get back to sleep for ages after they lay back down again. Or that it means a person might be sat alone on their stairs, their face lit by the glow of a laptop, while their family and friends eat dinner without them because they got a notification saying a site was offline, but it turned out to be a problem with an organisation’s Domain Name Server (DNS).*

For a small team, you have to take the pain away from the human side of on-call by doing what you can to share the load, reducing the stress of it via processes and automation where possible, and by compensating people for their time. Still, it being a small team, there are only so many people to share that burden with. It’s an ongoing process to make that human cost a little more bearable.

Service Level Agreements and providing good support

You can call it customer service, customer support, technical support, user support — whatever. It’s wherever people interact with you to get help.

When it comes to customer support, the terms of an SLA are – in my view – absolutely the minimum viable deliverable. The only benefit of skidding in just under an SLA target is so you could point to a line in a contract and say “well we met what we said we would so you can’t tell us off”. But that’s not really delivering for the users of a service, it’s just saving your bacon. Nobody wants to routinely nudge towards the maximum time an SLA will allow; we want to be doing loads, loads better than that. And great support is fundamental to delivering and improving a service.

Shoot me if I ever send out a corporate ‘values’ questionnaire so we can have a committee decision and then have them pasted up in the office for us to slowly ignore. Instead we have one aim at Delib, which is to always provide service. This is a nice, simple, memorable thing to deliver. It means:

  • You don’t sell stuff people don’t need
  • You build and maintain things that people want and find useful
  • You answer questions quickly and thoroughly

In short, everything you do in some way provides a service outwards for your users rather than just benefiting your own organisation.

Good support done well:

  • Saves people time and stress
  • Builds trust
  • Means you get better, more mutually beneficial, relationships
  • Gathers valuable information about how to improve your products or service

At Delib, support is not triaged. Users get straight through to members of the core team (as mentioned before: an account manager and an engineer working together on rotation) and not to a support function which is removed from the product, or a chat bot which can only suggest support articles or raise a ticket for other people to look at later.

‘Always provide service’ here means ‘get straight to a person who has the skills to genuinely help’. The same people who build and manage the products, are the same people who answer support questions, are the same people who write the support articles you can read. It’s a full circle. Getting support direct from people with skin in the game means not only getting proper, human help, but also that those people might have it in their power to improve the thing in the longer term by tackling the cause not just the symptom. This can equally apply to customer service teams who are removed from the main operation of the product/service if they are trained well and given the tools and freedom to log trends in queries, as well as ways of usefully surfacing issues with the people able and willing to make longer term changes.

Another thing I know from a long time doing this (not just at Delib), is that it’s very tempting to try and make support into a game you try to win. For example, Zendesk prompts you to increase your ‘one-touch resolution’: which means solving a query with just one response and presuming that’s nailed it. They even give you nice charts showing you how well you’re doing at it. But I disagree with this approach. How can you tell you’ve answered someone’s question thoroughly if your focus is on mashing through with the ‘Solved’ button? It’s not about how many queries we solve, it’s about how well we solve them. I’m more interested in how promptly and thoughtfully we are getting to the bottom of people’s issues, how easy it is for us to manage them, and how well we do at improving our processes off the back of those. That’s a longer game, but with better results for everyone.

An example of some support stats:

To provide a bit of insight into what supporting about 160 or so public sector organisations looks like using our way of doing it. Taking a week back in March (before lockdown and before things went a bit weird for us and our customers) we had 39 new support tickets. Often there is a bit of back-and-forth on a ticket, especially if more information is needed to be able to get to the bottom of them, so the interesting metric tends to be ‘touches’ (interactions) on tickets — of which there were 196. The quickest ‘touches’ are between two and five minutes, the longest can be over an hour if there is investigating to do and a detailed answer is required. This means that something between 16 and 30 hours can be spent doing support each week. I do irregular reports on trends and stats about our support tickets to try and help us pick off the longer game stuff I mentioned earlier. From those reports, the average number of tickets per week is about 40, with about 150 touches. Around 90% of our tickets are first responded to in under an hour, but often this is closer to 15 minutes. Those responded to outside of the first hour are either:

a) because they came in outside of the working day, or
b) because they came in among a flurry of other queries, which we’ll need to work through systematically

Six of the tickets in our benchmark week in March were not to do with us or our products, though the cause of the issue normally intersects with someone’s use of our software, so we’ll always need to investigate. More often than not, issues like these will turn out to be a problem with Microsoft Edge or Internet Explorer, or another of Microsoft’s suite of products. Sometimes it’s an organisation’s own network connection or firewall or somebody’s computer having a funny five minutes. These issues are not easy to get to the bottom of because we are not in the same office as the user. You need to use the information you receive and try to replicate it by mimicking the steps they took and by using the same browser version as them — we mostly use Chrome and Firefox in the office, so for MS products this means running a virtual machine with these on. The outcome is always a bit disappointing because if we find out it’s not an issue with their site there’s not much more we can do to help, and that’s not as satisfying to us as being able to solve someone’s problem.

It is extremely important that we provide good support. We put a lot of thought and effort into getting this stuff right, but it isn’t easy. Working with software, no matter how good that software is, will always mean there are questions and expectations that must be met. And our products don’t exist in a bubble either, they need to work alongside other tools and systems, too. We have spent time taking what we know about good and bad support (as service users of many products ourselves), thinking of the end-to-end process for our customers, and trying to build a system of support that delivers above what might be expected. We’d really like to hear if we are getting it right.

*This person has been me and no doubt all of my colleagues and maybe people sat in their bathrooms or on stairwells all over the world who do on-call, SLA-driven support.

For us at Delib, the most common alerts to our monitoring are: a false positive, a short connectivity blip on the network at our Infrastructure-as-a-Service (IaaS) provider, or a DNS server issue. However, in how it relates to our SLA, a site monitoring notification always has the potential to be a Critical Error, so we always need to investigate and, unless it’s a false positive, we notify our customer regardless. In the case of DNS issues we can’t do anything about it other than advise our contacts to speak with their IT team, as domains like are managed by each organisation themselves, and the site we provide generally sits on that domain e.g

For more information about Delib’s products visit our website or get in touch.