@DanLebrero.

software, simply

CTO day 7: Lucky Lotto, chaos engineering but for teams

Building resilience teams with chaos engineering principles.

Last year’s team size reduction was the third in a row that we had to do.

One of the side effects of such a continuous reduction was that the knowledge about our systems was thinly spread: less people but the same number of systems.

As reducing the number of systems was not palatable to Business, we had to mitigate the risk of having a lot of knowledge held in just one person’s head.

Limiting our work in progress has somehow helped spread some of that knowledge, but very slowly and not in all areas.

After watching some chaos engineering video, the idea of applying its principles to build more resilient teams was born in the shape of the Lucky Lotto.

Lucky Lotto!

Here is the email to introduce the Lucky Lotto initiative:

Sunday evening.

With very little faith, as every Sunday, you check the number of the Lucky Lotto.

Your heart starts to race, and you sweat like a little piggy.

You check the numbers not twice, but a dozen times.

It has happened. You won the Lucky Lotto!

As you start making plans for those 100.000.000 dollars, one thing is for sure: you will not wake up tomorrow morning to work, neither the rest of your life.

Congratulation! A life of luxury and emptiness awaits you!

Wait, what happened to your Akvo’s team?

Welcome to Akvo’s Lucky Lotto!

Starting last week of September, we are going to start running our own Akvo’s Lucky Lotto.

All of you will have a chance to win, and your team to enjoy the results of your disappearance.

Rules:

  1. Every Monday a random person will win the Lucky Lotto.
  2. The winner will work on some side project.
  3. The winner will be completely unavailable to colleagues and to the rest of Akvo for the week.
  4. Everybody, including product managers, gets one ticket every week, even if you don’t want it.
  5. Every time that rule 3 must be broken, the winner must make a note (I will share some doc to do this).

This is nonsense. I would invest all the Lotto money in Akvo and come Monday 9am sharp to work.

So would I! But “You won the Lotto” sounds better than “You were run over by a bus”.

What are those side projects?

Will depend on the skills of the lucky winner, but it could include:

  1. Platforms work.
  2. Workflow Improvement work.
  3. Research.
  4. Learning.
  5. Brown-bag session preparation.
  6. Important but non-urgent work.
  7. Little side projects for other Akvo departments.
  8. Collaboration with Hubs.
  9. Do some “Week of little things” items.

Can I win twice in a row?

And thrice!

Do I have to isolate myself completely?

No. If you are the winner you can still socialise and attend meetings/stand-ups, but you are not allowed to provide input.

I have a very important thing to do that only myself can do and if it is not done the world will be destroyed.

This is basically the point of the exercise. To find those things and ensure that there is somebody else is able to handle them.

Obviously we are not going to jeopardise (completely) our work, but if you find yourself with one of these tasks:

  1. Make a note.
  2. Try to bring one of your colleagues to do the task with your supervision.

This is going to force us to become more T-shaped which in the longer run should make the team run smoother and be more adaptable.

If the winner is announced on Monday morning, how can we plan around it?

MUAHAHAhahaha

muahahaha

Let me know if there are any questions, suggestions or comments.

Cheers,

Dan

PS: I wonder if this email will pass your email filters.

The Lotto will give two learning opportunities:

  1. The team will need to fill up for the skills and work that the winner of the Lucky Lotto.
  2. The winner will have a week to do something different.

The process

Every Monday 9am we will roll a dice and inform the lucky team.

The winner and myself will meet (but everybody was welcomed), and agree on what will be the objective of the week, which will depend on my TODO list, the winner’s interests, and other teammates’ suggestions.

The teammates’ suggestions proved to be the most interesting.

Results

Three months running the Lucky Lotto showed several instances of a bus factor of one, and gave the teams the opportunity to step up, learn and cover for the missing person’s skills.

As an example, our one and only Android developer won the Lotto the same week that the team was going to fix some major performance issue on the communications between the app and the server. It was a great learning experience for the team.

For the Lucky Lotto winner, it was a very enjoyable week, to either learn something new (Kubernetes, backend development, our deployment pipeline, Cypress, Clojure, …), work on those long desired dev improvements that we never had time for, or to do something different from the usual churn.

These days were a great mirror into where I actually spend my time and if that is the best way to handle the tasks. One of our Product Managers

In addition to the knowledge sharing, we got some cross-pollination and broader-team building as some winners decided to work with the other product team during their Lotto week.

As the ex-platform team lead, it gave me the opportunity to schedule some platform work that was no longer happening, giving the winner the “luxury” of learning some about our platform.

For the rest of the organization, we did some week of little things items, and automated another Finance process (pro-tip: ensure you are on good terms with the people that have the money ).

All looked amazing until the “proper week” …

A proper Lucky Lotto week

Panic ensued on the day that THE backend developer of one of the teams won the Lucky Lotto THE week of tight deadlines and unavoidable client promises.

The developer (and his product manager) asked to reroll the Lucky Lotto so that he could have a “proper Lucky Lotto” in a quieter week and for another developer to enjoy it this week.

How many weeks happen to be THE week for THE developer?

A proper Lucky Lotto week?

I realized that the Lotto’s emphasis has been drifting in the wrong direction:

As I have seen some confusion about what is the objective of the Lucky Lotto and its priority, lets review its objective:

Build resilient teams.

That it is.

Why?

So we are less stressed in the future, as we will have a more flexible team.

Working on the most important thing requires this flexibility. Consulting requires this flexibility.

As a generic statement, we are all specialist but not that generalists. We need specialised generalists.

Note that most of us have already done the difficult bit: to become a specialist. Now we just need to do the easy bit: to learn enough of other disciplines. Just enough.

How?

By applying eustress to the team’s week.

In our case, the antifragile practice that we are trying is making one team member “disappear” for a week, and the rest of the team to pick up the missing person’s work.

“Proper” Lucky Lotto week

Reflecting on the past weeks of Lucky Lotto, I think I got wrong what a “proper” Lucky Lotto week would look like. I think I emphasised that the important thing is to not be bothered by anyone, for the team to really to not need you at all.

This was wrong.

A proper and successful Lucky Lotto week is one that, in order of importance:

  1. You spent a significant amount of time teaching others some of your skills.
    1. Teaching is more efficient that the team self-learning.
  2. Some not-easy-to-transfer knowledge gaps are identified.
  3. Winner gets to do something different.

Again, point 1 is more important than point 3.

Rules (expanded):

  1. Every Monday a random person will win the Lucky Lotto.
  2. The winner will work on some side project. Still work.
  3. The winner will be completely unavailable to colleagues and to the rest of Akvo for the week.
  4. Everybody, including product managers, gets one ticket every week, even if you don’t want it.
  5. Every time that rule 3 must be broken, the winner must make a note (I will share some doc to do this):
    1. Note that this means that rule 3 is a soft-rule.
    2. There is no “punishment” for breaking rule 3. It is not good neither is bad. It is a fact, and we want to know about it.
    3. Try to bring one of your colleagues to do the task with you or under your supervision.
  6. Team should avoid delaying the work for a week.
    1. Is the winner not working on the most important thing?
  7. This is a “best effort” initiative:
    1. Real deadlines have more priority.
  8. Be practical. It is better to break rule 3 than:
    1. To waste 5 days of your team’s work.
    2. To waste 5 days of one team member.
    3. Miss a partnership related deadline.
    4. Let one teammate struggling with your task for a week.

The winner’s objective should be helping the team learn and to transfer her knowledge.

I actually don't think specialization is more difficult than generalization, but I thought it will encourage the team.

Next

Lucky Lotto worked pretty well as a fun way to create safe opportunities for the team to learn, spread the knowledge and increase our bus factor, while giving the winner room to break from routine.

An initiative well worth keeping.

But … my efforts to translate the newish business strategy to a technology strategy were bearing fruit, and with the new financial year looming, it was time to start rolling it, which meant a radical change on the Dev team that will absorb all energy and leave little room for experimentation.


Did you enjoy it? or share!

Tagged in : CTO diary resilience testing