100% Cuptime
5 min read coffee · overengineeringI take cuptime very seriously. Everybody knows that actually achieving 100% cuptime just isn’t possible. There are too many factors which can spoil that magic 100%. Even the big tech giants like Google, Microsoft and Amazon aren’t able to hit that magic number. With this many companies working on the problem you would think it would have been solved by now right?
Recently I discovered a fundamental infrastructure flaw that was negatively impacting my cuptime figures. After a bit of research I soon discovered that I wasn’t the problem, this was an issue impacting everyone. Like any good engineer I decided to spam slack for some help.
I had done a root cause analysis of the last three incidents. It turns out that every single incident could have been avoided if the cup was untippable. Had nobody ever noticed this issue before? Was I missing an obvious requirement somewhere? Is there maybe a proper use case for having a tippable cup and this was actually a conscious design decision? Maybe training users to use untippable cups would take too long and cost more money? Surely this would greatly improve our cuptime figues without any downsides right?
Luckily I am surrounded by brilliant engineers who understand how important cuptime is and have experience with this kind of stuff. It turns out there were a few simple backwards compatible hardware upgrade options already on the market.
There were a lot of great suggestions but there was one clear winner. So we organised a group buy to make sure we would have enough testers.
So my Mighty Mug arrived and I decided to do an acceptance test with some water. I first tried to knock it over…it did not fall. All of this trying to knock things over had made me thirsty so I decided to give drinking a go. Due to my inexperience I accidentally missed my mouth and had water spill on my (hot swappable) pants. This was mostly a user error and something that can be easily fixed with proper user training. And to be fair this was completely unrelated to the original issue that was causing most of the outages.
A few days later someone handed me a package containing these straw glasses. It was a nice gesture but they were low quality and didn’t work very well. They were supposed to free up my hands but it actually involved two hands to operate properly without the risk of spillage. An unexpected feature was that cold drinks cool your face and ears and hot drinks warm up your face and ears.
The main disappointment however was that nobody noticed that I had the output of cat -h
in the background of the photo with the cat wearing the glasses.
I really didn’t expect anyone to buy any of this. As a thank you I promised to use them at lunch for five days.
A few more days later someone handed me another package and inside was a Hangover Cup. This one works great but didn’t pass the shake it really hard upsidedown after drinking test. Luckily that is easy to fix with good user training.
What problem am I trying to solve?
- Lost time cleaning up the spillage and re-provisioning coffee filled laptops
- Extra stress on your relationship explaining to your girlfriend that you ruined that pretty new white sweater
- Productivity loss for all engineers in a radius of approximately 15 meters while they are laughing at you cleaning up your spilled coffee
- Cost of the coffee, milk and sugar
- Coffee is a circular dependency. Coffee (engineer fuel) is needed to help cleanup the coffee
Cuptime improvements
- Coffee clusters in multiple availability zones (already implemented)
- Single point of failure discovered in coffee cluster (shared milk source)
- Implemented rolling upgrade cleaning procedure using a workaround for the milk SPOF. Previously both clusters members were taken down at the same time for cleaning
- Untippable cups
- Handsfree straws
Rolling upgrade of coffee cluster with milk workaround. You will notice that although it has a single milk source it does still have separate milk pipelines allowing for one to be removed during cleaning. Right now this is still a manual procedure but could be automated by modifying the milk container to have two separate sections.
Known issues
- Users not using handsfree straws (not enforced) can still spill coffee
- Untippable cups do not work on angles greater than 90 degrees
- The amount of time lost explaining how the untippable cup works has exceeded the ACCT (Average Coffee Cleanup Time)
- Untippable cups are not yet compatible with
$CLEANINGSERVICE
and need to be cleaned manually by unqualified engineers - Unexpected downtime caused by untippable cup being tippable on wooden surfaces
Cuptime incident log
2017-02-01T10:56
user | mrussell |
cause | User error. Michael missed his mouth while trying to drink from the cup |
impact | Slightly wet pants in embarrassing area |
time | 2 minutes |
2017-03-06T10:32
user | mrussell |
cause | User error. Michael walked into a meeting, placed his cup on a wooden table and tried to push it over to showoff his cup |
impact | No one took him seriously for the rest of the meeting, slight spillage on wooden table |
time | 60 minutes |