A post about the operational effort behind turning around a quick project
A few weeks ago, we were asked by our colleagues in the Scottish Government about using their Dialogue site for a big conversation with the people of Scotland.
The conversation was about what the next steps might be when navigating to a new normal in the face of the Covid-19 pandemic. It was a big discussion. Over 4000 ideas were submitted, 18,000 comments made on those ideas, and site requests were into the millions.
This was a grown-up initiative to try and bring people into the process of solving this challenge collaboratively. It also became, maybe unexpectedly, a window into the lives of people living in Scotland in lockdown during a pandemic, showing the reality of that scenario for many people, the pressures being faced and human needs left unmet.
I’m extremely pleased that we were able to be a part of it and this blog post is going to attempt to explain some of the more operational detail around supporting something like this as a software provider.
I want to be very clear that this conversation was run by the Scottish Government. It was their challenge and their initiative to open up the conversation to people in Scotland. It was their efforts in running, moderating and analysing that conversation and taking it through all of the necessary governmental processes that made it happen. And it was the contributors to the conversation who submitted ideas and comments that gave such rich insight into this particular moment in time we are all living in. Our role was in providing and supporting the platform on which this exercise was run and our main job was to not have the software detract from the importance of the conversation.
Over the past few months, rapid changes have been made across government and in many industries. Large announcements and initiatives are going out—normally backed up by some kind of online element—to a more captive audience. There is pressure on those online systems to perform in unprecedented times (sorry, that phrase had to come in somewhere) and at short notice. Those platforms and websites often need rapid innovation to adapt to this surge in interest, which might not have been anticipated when they were originally designed. This isn’t always easy to turn around and the risk is that the narrative then becomes about a site being offline or a service unavailable. This can lead to distrust and the view that everything is poorly organised, rather than the purpose for which that service was delivered.
This post is not about Dialogue as a product or how to run a good exercise using our software (we have that covered elsewhere), though I’ve included a bit of context to give an idea of what the software is. Instead, I hope that by shedding light on the process we took behind supporting this particular and very big challenge it might do a few things:
- Show what happens way down in the long grass when a big public announcement is made and the efforts to support it at the human and software level
- Share some of the steps we took to try and help our customer make this happen, in case they may be useful for us or for other teams doing similar in the future
- Talk a bit about having and valuing strong working relationships
The context
Dialogue is a product we make. It’s a pretty simple platform that’s designed to involve people in policy decisions. Though that sentence makes it sound fairly dry.
What happens in the best Dialogue challenges is that people — you, me, regular folks who care about a thing—get to be involved in a constructive conversation about it online with whichever local or national government organisation is running it. We do that by submitting ideas to their challenge or commenting on those of others.
In the best of these challenges:
- Those ideas and conversations get listened to (properly listened to) by the organisation and can help to shape what happens next
- The policy and service teams running the challenge, sometimes Ministers and other elected representatives, get to hear real stories about people’s lives, and they get ideas for solutions they might never have considered without having that lived experience themselves
- In turn, we—the public—get to see differing points of view in a way that doesn’t invite immediate rage. We get to be included as adults in something that has an impact on our lives. And through seeing and appreciating the breadth of differing views we can even get a flavour of how hard it might be to make policy or change that pleases everyone.
It’s not a perfect thing, not by any means, though it can be a good and sometimes surprising thing.
I’m aware, though, that this seems like a misty-eyed utopian dreamland. Constructive conversations very rarely happen online. Online is where people with opinions shout at other people with opinions who look like eggs. It’s where flame wars and pile-ons happen. IT’S WHERE PEOPLE MAKE THEIR POINT IN ALL CAPS.
And it’s true that often a conversation like this doesn’t happen well. Mostly it doesn’t happen at all. Having an open, public, online dialogue with citizens is not a thing that government organisations are largely comfortable doing for all of the above reasons. It feels like too big a risk. The ghost of Boaty McBoatface haunts these shores. And nobody wants anything to escalate quickly into an uncontrollable bun fight. Though with something like this, if you get the framing and process right and ask a genuine question about people’s lives it will invite thoughtful and reasoned responses. A name of a boat isn’t a thing people need to (or will) take seriously. Reopening an economy, controlling the spread of a virus, letting people see their grandkids, is.
A further barrier to why this kind of online conversation doesn’t happen as often as it could is that an open type of dialogue isn’t a known statutory process like consultation – it’s something else, something which can be more difficult to pin down if you’re trying to make a case for it. It takes an investment of time and a well-designed process to hit the right note of humility and openness that a structured conversation like this needs. It’s certainly not a thing that social media tends to be good at with its instant notifications and lack of moderation or central control: that’s a recipe for a type of conversation perhaps, but not usually a constructive one.
I think that’s probably enough preamble – moving on to this project in particular:
1. What happens down in the long grass when your tool is going be used for a big public dialogue
The Scottish Government has a Dialogue site already, the platform wasn’t bought specifically for this task, but is in a suite of tools used for digital engagement. Dialogue is a platform that costs under £10k per year for as many exercises as an organisation wishes to run on it. I say this not to be like ‘buy it now!’ but to highlight that it is not a mega-massive bespoke project type-thing; it’s a fairly inexpensive, off-the-shelf, Software-as-a-Service product for running multiple challenges.
Dialogue is designed and mostly used for fairly concentrated, even niche, conversations like ‘We have a pot of money to spend on Example Park— how would you spend it?’. It is absolutely built for hundreds of ideas and comments and sustained, busy traffic.
Way back in the past when we were a project company, we built and scaled bespoke one-off versions of what we now call Dialogue for a couple of very big national or international conversations. The current standard Dialogue product underlying infrastructure is not naturally geared for a national conversation about a global pandemic with highly unpredictable and potentially huge traffic.
So when we got the call at the end of April asking about its capacity to handle a high volume of responses for something like this, we were honest – we didn’t know if it could do that. Even with the interventions we believed we could try and make in the few days between that phone call and the potential launch date, we still had no way of saying for sure if it would cope as it was untested (and had no way of being appropriately tested) in that kind of scenario.
Bearing this in mind, we had to first make the right call inside Delib. We are a small team with 160 or so other public sector customers to support – all doing vital work – so we had to decide if we could spare some of us for this, pause some of our other engineering projects for a short while, and make a plan for what we could do in the time available (a week).
With that as the starting point, as a supplier our best response can only be one of total honesty with our customer. Being honest about the risks that a site may struggle under intense traffic. Being honest about the work we can feasibly do in the time available to give it the best chance, and about all the unknowns in an untested situation like this.
In this case, we couldn’t comfortably back the plan for Dialogue to be used for this without acknowledging these risks openly and having them accepted, but we could promise that – risks accepted and understood – we would do everything we could to support it if the team chose to go ahead. These concerns were discussed openly and fully with our colleagues in the Scottish Government to allow them to make a properly informed decision about what they wished to do.
To be anything less than completely candid here would have run the risk of damage not only to those we immediately work with in the Scottish Government who needed to pass on a true picture to Minsters and other teams, but also to the reputation of the Scottish Government itself if we were to give a false impression.
And in a continuing spirit of honesty, there is absolutely a risk to us as well. Our reputation is at stake in this kind of situation regardless of how honest we are upfront: despite the site not being designed for this, what impression does it give of us and our work if our software does not hold up? Even with best intentions and risks outlined might we be held responsible if the software runs into difficulty and causes our customer to look bad?
A person visiting the site who wants to have their say isn’t going to be bothered about all this background context – they just want to contribute on a working platform and will lose faith in the government and the process if that can’t happen.
2. Some of the steps we took to try and help make this work
To allow us to manage this properly, we had to set out the parameters of what we would be able to do. There are a few steps to this:
Step 1 – get the overall information:
- Establish the timescale available – in this case, a week
- Find out: when will the announcement be made, what will it be about, how will it be promoted, what will the challenge entail, and what is the desired outcome?
This gives an idea of the potential load on the site, who we might expect to turn up to it, and allows us to build those nuggets of information into the plan for the technical interventions we might be able to make in the time available.
Step 2 – what are we trying to achieve:
- Keep it online and secure: To get the platform as technically ready for sustained load and peaks of concurrent requests as best we can so that it can perform at its best
- Help it succeed: Establish a plan of helpful, open and regular communication between us and Scot Gov
- Keep Delib running: To ring-fence the project so that the rest of Delib can continue their work
Step 3: Get the plan in place and in action:
As anyone who manages projects will know, with a tight timescale, making the plan and getting some of it started are often things that happen almost simultaneously. For that reason, I’ve not put timescales or a linear flow of work as we had a week to get this turned around and a lot of these things happened in the space of a few hours or alongside one another with some of us on one part and the others managing the rest. What was important was to have excellent communication between those of us working on it, so we paid a lot of attention to getting that right.
To address aim 3: Keep Delib running, we put together a small team working solely on this.
Who
Alan and Michaël (engineers), Lauren (Scottish Government’s account manager), Stan (QA), and me. It worked well for us to have a clear split, with Lauren and I acting as the conduit for communication between Scot Gov and Delib: providing help with the process of running the challenge, answering questions, and making any site changes that were within our power. This allowed us to leave Alan, Mike and Stan free to focus on turning a lot of the server and platform-side technical work around quickly.
Communication inside Delib
We set out ways of communicating internally as the core team for the Dialogue, but also with the rest of Delib so everyone was kept up-to-date.
For the core team, within an hour of that first chat with Scot Gov we:
- Set up a separate IRC (internet relay chat) channel so we could share info regularly between us without disrupting the rest of the company’s chat channels and vice versa
- Started a collaborative planning document/checklist to establish what we could feasibly get done in the time available, to get all of the interventions noted down, to assign people tasks from that list and keep on top of it all. For this we use our own tool rather than a Googledoc. Though either would work, we consider ours to be more secure and it’s what we use for any firefighting so it’s a known process inside Delib
- We made a ‘Big Story’ plan.io ticket (our workflow tool) for capturing the various engineering, QA and other tickets for this project and added any updates from the team at Scot Gov which might impact the work. This also made that work and the process visible to the wider company in case others might need to step in or we’d ever need to refer back to it
- We arranged a daily call video call so we could catch up on work done, what was coming up next and any questions we had
These communication and workflow methods remained in place for the whole project.
For those not working on the project, I sent out regular digest emails and gave updates at our stand-ups and other wider-Delib meetings. This kept everyone updated, but without needing to interrupt their work.
We also ran a retrospective after the project had finished and recorded this so that it could be watched by anyone in Delib.
Technical interventions we made in the week:
To address aim 1: Keep it online and secure, we:
Increased the hosting capacity of the site by migrating it to a much larger virtual machine.
Made cacheing changes to allow the public-side caches to operate at a high traffic frequency. We were aware that the downside of this would be some requests taking longer if the site was less busy as the caches would cool down. It also meant that big admin-side tasks would take longer. This intervention ended up being one of the real saviours, especially with the huge peak of requests within the first hour of launch as that’s the killer time when people can lose faith in something like this.
We had to address two sides to the site, as both needing bolstering. There is the public side of Dialogue which we expected to be accessed by thousands of people – both lurkers and active, logged-in submitters. There’s also the admin side which was going to be accessed by moderators and analysts. We weighted the site more to the public side as that was where visible issues might lie and we felt it would cause more burden on the Scottish Government team if the public struggled to access the site as it would mean they would be fielding more communication from frustrated site visitors.
However, we were then aware that larger admin-side tasks would take longer and that there may be an impact on moderation, which we knew needed to happen at speed. On balance, we felt that if errors (503’s etc) were going to occur, it would be better for those to be seen by the dozen or so admins in the Scottish Government than the thousands or so people on the public part of the site. Not that we felt good about that (no errors would be the ultimate goal), but it was a trade-off that we needed to make.
We also noted any expected risks to this project and worked to intercept them:
As a product with a registration form that sends transactional emails, we figured there was a risk that with a higher than normal throughput of traffic our mailing software provider might mark our account as spam, so we intervened to ask them not to do that. As an example of splitting the work out efficiently, this was something Lauren and I could do, leaving the others to continue working on things that really do need engineering skills.
Another risk was that any of our fleet-wide internal processes for updating sites could overwrite the changes made to the Scot Gov Dialogue. We had to block off the site from the rest of the fleet and support it separately. How do we know this is a risk? We have learnt it the hard way in the past. We call sites on their own configuration a snowflake as they’re special and unique (it’s sad that snowflake has been co-opted as an insult on social media, it’s not used that way by us). The problem with snowflaking sites as a product company is providing ongoing support for them and bringing them back in line with the core product later. There’s a risk of forgetting they are on a special configuration and then not applying crucial updates like security patches, so we have a system for logging anything outside of the norm and getting it back to its usual product state as soon as possible.
Addressing all three of the aims was underpinned by how we communicated with the team at Scot Gov. This was vital in making this project work, for us certainly, but I hope for Scot Gov as well:
Communication with Scot Gov
In order to provide clarity and to manage the workload of this project, it was so important to set out ways of working collaboratively upfront. It makes it much easier for everyone if parameters are laid out in advance so we can work together as a big, single team supporting each other.
We established a single point of contact using one Zendesk ticket to capture all questions from the team and any back and forth between us all. We use Zendesk at Delib to respond to support queries. Having a new ticket created each time there was a new question would have quickly become unsustainable and confusing for us all (both Scot Gov and Delib). It would also disrupt the rest of Delib as all account managers and engineers get an email if a new support query is submitted. A single ticket meant it was all in one place and we have the whole narrative in one chain of communication. Granted there are 167 updates on that ticket, but that’s much easier to manage than 167 individual tickets!
I also felt it was important to establish what we could cope with in terms of managing queries. For example, if members of the public had questions which were sent straight to us, we might quickly become overwhelmed. It was helpful to be clear about that and what we could/couldn’t do with the aim of allowing the Scot Gov team to plan as well.
We confirmed in writing how frequent communication would be and the things we would be keeping an eye on. We also set out what was outside of our control or would go outside of what we’d be able to do. For example, if the site was under sustained load and started to struggle we knew our only option would be to reboot the server as we’d already exhausted all the other potential interventions we could make in advance of the go-live date. Being upfront about this meant we all knew what would happen if there was a site outage.
I set out what we’d be reporting on and sent an update at the end of every day the challenge was running which included: if there were any outages and for how long, how many requests, peaks in requests, the time of day of those peaks, and changes we made along the way at the team’s request. It was important to be clear about what we could and would change and what we could not. We did make a number of changes as the challenge progressed, but we also had to refuse a few riskier ones as we wouldn’t have had the time to test them adequately or they could have risked site performance. At all times our main focus was on those aims of keeping the site safe and online and the lines of communication open and honest.
It was so important to us to be candid about what could go wrong and the most likely pain points. It is hard to have these conversations, but it’s much harder to deal with a situation that you might have foreseen but didn’t want to bring up in advance.
We also made suggestions based on assumptions about what might crop up as the challenge progressed. We were possibly overly cautious about the risk to the site if it was post-moderated – this means allowing submissions to go live before they have been read and approved – so we recommended pre-moderation. We think it made a positive difference, but it’s difficult to say whether the quality of the conversation or the performance of the site might have been affected had a different path been taken.
Another example is the exports of the submitted data. As the challenge progressed we realised that these might grow to the kind of size that would struggle to generate before hitting a time out, so analysts or moderators trying to generate a report during a period of heavy load on the server might unwittingly take the site offline. Between us and the team, we made an intervention where we could get those exports for the analysts twice a day server-side instead.
I think there’s also something here about being on it and alert to the project and the potential situations that may arise. Working on a project is often a bit fraught and all most of us want is reassurance. It helps enormously to be organised, thorough, and proactive in your communication because this is reassuring for you and for those you’re working with.
3. Having and valuing trusted working relationships
We have worked with the Scottish Government for a long time now. It is a collaborative partnership and a trusting one, and there’s no doubt that the good relationship we have was a big factor in how this went.
There needs to be a reliance on mutual respect and trust in a high-pressure situation like this: respect for each other’s time and efforts and trust that we all want the same thing.
It’s very easy in a time of stress for a large organisation to unwittingly trample on a much smaller one, especially in a buyer>supplier relationship because those are naturally weighted in one direction. I could easily envisage a very different working relationship where we could have taken the same boundary-setting, honest approach, made the same technical interventions, and provided the same guidance to reach the same good outcome, but where those boundaries would not have been respected and our guidance ignored in the process of getting there. In those cases, even if the project is ultimately a public success, it ends up being at a much higher cost behind the scenes to everyone involved in terms of time and emotional labour – that aim of protecting your team or the rest of your business could be obliterated. And that higher cost is not just to the supplier either. It’s also hard for a customer who doesn’t trust their supplier or work with them collaboratively because they end up spending a lot of time questioning, being frustrated, asking for things that may be undeliverable, and generally expending emotional energy, too. It doesn’t have to be that way, and it’s mostly a matter of believing that both parties want the thing to work and when a team says ‘we’ll be doing everything we can, we are supporting you’, they mean it.
4. The outcome
Well, friends, we’ve almost reached the end of this opus on trust and turning around a project in just over seven days. I’m pleased to report that in the time it has taken me to write this, Scotland is coming out of lockdown and the people of Scotland are able to begin seeing family and friends and participating in the sports they enjoy – all things mentioned many times over in the Dialogue challenge.
Did we get everything right? Who does? We learnt a lot and there’s not a lot I’d change in how we approached this. If we’d have had more time we’d have loved to have delivered some more features like making it immediately visible if an idea is locked to new comments, and other tweaks to help the moderators with community management. What’s been really useful to us was having a proper catch up with the Scot Gov team after this finished so we could learn from their experience at the sharp end of it all. In time we can take those suggestions back into the core product, but for now we have some engineering catch-up to attend to.
Technically speaking, the site held up remarkably well. After all our doom-mongering cautious concern and fears that it may crash there was only one episode of slowness and a few minutes of downtime in the week it ran. This was on the final day, when the site was at its most full of data, and coincided with a promotional push on both Twitter and Facebook simultaneously and an accidental request of an export on the admin side. It stayed online through briefings from the First Minister, being on the homepage of BBC News, and huge daily peaks of traffic.
I’d absolutely encourage anyone interested to read the excellent two mid-way analysis posts and the post-challenge analysis report from the Scot Gov team. These were all turned around within a few days of the challenge starting and closing, which is an absolute win for transparency and openness.
Most importantly, though, it’s clear from these reports that the ideas and comments from the people of Scotland who took part had a genuine impact on the route map out of lockdown. We’ll absolutely take a couple of weeks of hard graft to be able to play a small part in that.