Engineering as Sketch Comedy
I do not watch a lot of comedy, but there have been a few moments that have struck my fancy and remained with me. I want to recall three sketches and relate them to engineering. Anyone who knows me will have heard me relate one or more of these sketches to something we have worked on, to provide a laugh. Beyond the humour, comedy has a way of being an observationalist, highlighting the truth, whether the truth is seen or unseen, spoken or unspoken, funny or painful. Often, the truth hurts, and comedy becomes a way of sharing and building connection, helping us be resilient. While these sketches are useful for lightening the mood and having a laugh, I think they also provide relevant commentary for understanding engineers and engineering work.
Sales Aren't Up, They're Down
The company CareerBuilder featured this commercial during Super Bowl XL. The setting is the typical open-plan office. Sitting at his desk trying to concentrate, a perplexed employee can see and hear a group of chimpanzees—yes, literally chimpanzees—partying to the song Cum On Feel The Noize by Quiet Riot in the nearby, glassed-in conference room. Champagne is flowing, money is being burned to light cigars, and there are the obligatory banana peels. They are literally screaming like chimpanzees over the loud music, delighted by a large chart that shows that sales are way up!
Exasperated, the employee gets up from his desk, enters the conference room, and stops the music. He turns the upside-down chart the right-way up and explains, "Just thought you might want to know, sales aren't up, they're down". The chimpanzees stare at him in quiet disbelief. Then one of the chimpanzees shakes his head, no, and proceeds to correct the situation by turning the chart upside down again. Queue the music and screaming chimpanzees, fill your glass with champagne—the party rages on. A chimpanzee motions to the man to start dancing.
This commercial certainly highlights some of the joys of the open-plan office, but it also speaks to an experience all of us have had: misinterpreting the data. We might be looking at performance benchmarks, scalability improvements, or narrowing in on the cause of a production issue after some difficult and intense troubleshooting. Compelled by the promising results, it is difficult to face the reality that our interpretation of the data is wrong.
I was involved in a project where the goal was to increase the performance of a system for time-series-data processing by an order of magnitude.[1] One day, the team was blown away by the results one of our colleagues was able to achieve after some refactoring. Benchmark tests showed that he had achieved a performance breakthrough. A critical subsystem went from processing about 100 thousand messages per second to processing over 5 million messages per second! It wasn't until a while later, when we incorporated this change with the rest of the system, rather than benchmarking it in isolation, that disappointment slowly set in.
The reason the performance had been improved so much was that messages were now accumulating in memory. The bottleneck remained downstream. This breakthrough wasn't a breakthrough after all. After showing so much promise, it was hard to let go of it.
There can be many contributing factors to misinterpreting the data: we may not have all of the data, only a subset; we can be in too deep and fail to see the bigger picture; we can be emotionally attached or blinded by our own biases; rushed to provide results and come to a conclusion before we are ready; or we may not have the knowledge or experience to draw the correct conclusions. Being wrong can be an emotional and shameful experience—as engineers, we are paid to solve problems and think logically, we are valued for our insight, reasoning, and ingenuity. It can be difficult to let go of our ideas when someone else points out an error in our judgement, or has a better solution to the problem, until we can see the error in our reasoning, or understand the superior solution.
The reaction of leadership to errors in their judgement, or the judgement of others—like in this commercial with the chimpanzees—is particularly telling with regard to the culture of the organization. Creating new products, designing new systems, evolving existing systems, integrating with legacy systems, and troubleshooting production issues, are all creative processes. We need to be open, accepting, and patient with people experimenting and failing—this is called learning. We also need to remain flexible to changing our view of the world when presented with new data that does not fit our mental model. I have a colleague who purposely tries to assume a point of view inconsistent with his as a way to check his work and expand his understanding.
With that said, knowing that reality must set in eventually, sometimes it can be more fun to ignore the data for a while, and keep the party going—so let's get back to the comedy...
Try It Now!
The Kids in the Hall are a Canadian sketch comedy troupe. They had a popular and somewhat outrageous television series in the early 1990's, when I was in high school. In a skit called Try It Now, Danny and his wife leave the house for a night on the town. Their car won't start. Danny's wife suggests to him, "Maybe the spark plugs aren't firing?". Danny brusquely thanks her for the tip, but encourages her to leave the problem to him.
He gets out of the car and opens the hood. Paralysed by the complexity he sees before him, he just looks around for a bit, before telling his wife to, "Try it now". The car still won't start. He looks around a bit more, and without ever touching a thing, he confidently closes the hood of the car saying, "That should do it." He tells his wife to, "Try it now". Of course, the car still won't start.
Looking more concerned about his ability to take charge and remedy the situation, he resorts to increasingly more creative and ridiculous measures—kicking the tires, washing the windshield, and painting the car a different colour—after each attempt telling his wife to, "Try it now".
In an attempt to "surprise" the car into starting, he pretends to give up, walking back towards the house saying maybe they should just stay home and watch a movie. Suddenly, he whips around yelling, "Try it now!", to no avail.
Finally, he opens the hood of the car one last time and finds a small child inside—one of the neighbour's kids. He tells his wife to, "Try it now", one last time. Indeed, the car finally starts. He returns the kid to the neighbour's house, stuffing him through an oversized pet door. The child is pulled through the door accompanied by the sounds of ravenous dogs.
Thinking of this skit while troubleshooting a difficult problem has given me a good laugh, many times. Beyond the obvious and ubiquitous, "Have you tried rebooting it?", I love this sketch because it touches on more subtle aspects.
As silly as it is to wash the windshield or paint the car a different colour in the hopes that the car will start, we have all experienced someone troubleshooting a problem in a similarly illogical manner. A temporary workaround to a problem can quickly become ritualistic thinking, adopted for years, well after the original problem is fixed. When the end customer, or an installation or field-service team, is exposed to such issues—people who do not necessarily have a fundamental understanding of the system internals, with more of a focus on its operation—it can take years to overcome these troubleshooting rituals. Let this caution anyone wanting to prematurely expose someone to a new feature before it is ready.
As an engineer, it can be very uncomfortable when people are looking to you as the expert, but you do not have a full understanding of the system—like Danny looking under the hood of the car. Often engineers are maintaining and evolving systems that they did not design or build. We need to be patient and understanding of their frustrations with these systems.
For many years, I had to spend a week every few months working front-line technical support at the software company I worked at. Our software was installed in demanding industrial environments. When I picked up the phone, I never knew what I was going to get. It could be a call about a product that I had no experience with—often the customer was much more familiar with the product than the person answering the phone—or a problem outside of the software itself, like networking, storage, or authentication. This experience always made me anxious, but it taught me to remain calm, ask questions, think logically, and think in systems. In the end, I would solve the customer's problem and learn a lot in the process—some of these problems were pretty complex and could go on for months. Customers, who were also usually engineers, were almost always patient through this process, since the focus was on solving their problem, in a collaborative manner, rather than jumping through hoops or running through a script. This experience gave me much more empathy for people helping me with a technical problem that may not be in their immediate area of expertise.
I still get this same anxious feeling when I am on-call for the services that I am responsible for at work. My team runs a large number of critical services and no one person can be intimately familiar with all of them. Many of these services also interact with services supported by other teams. When an alarm goes off and I am supposed to be the go-to person, it is initially a bit paralyzing when it is a service that I am not familiar with. When it is a service that I am familiar with I have agency. When I have the ability to act and I am confident in my abilities, the situation is much less stressful.[2] However, every time there has been a production issue while on-call—thankfully they are quite rare—I always make it through it. Usually it involves revisiting the basics: recent changes, system dynamics, understanding the layers of the system, and examining log messages. I always learn new things and other people always jump in and help. I am reminded that I am not alone and that I don't need to have all the answers—it is not even possible to have all the answers—even if I do need reminding of this every time I go on-call.
Finally, how many times have you suspected errors with the hardware or the infrastructure, only to realize the problem was more obvious? It must be a networking issue, or DNS, or a compiler bug, right? How did Danny not see the child the first time he opened the hood of the car? Rather than admitting that we were wrong and taking responsibility, it is easier to save face by diverting attention—in this skit, literally feeding the innocent child to the dogs. We should remember this in the moment—pause, breathe, think logically, invite the opinions of others, and realize that we are not alone. Above all, never feed anyone to the dogs, especially in search of the "root cause".
Duct Tape Spare
The Red Green Show was a Canadian television series that also aired on PBS in the United States. The title character, Red Green, was a handyman who liked to take shortcuts. His preferred way to fix nearly anything was using duct tape, or as he calls it, "The Handyman's Secret Weapon".
In the Handyman Corner segment of one episode, Red sets out to show the viewers how to change a flat tire. He begins by removing the hubcap, before using a whole can of WD-40 penetrating oil[3] to "loosen-up" the lug nuts—all while throwing in a few jokes about the fumes starting to loosen him up; being a skeptic about advice to rotate the tires, since they are rotating all the time; and commenting that the tire iron is a simple tool, but a useful tool, unlike his brother-in-law.
After failing to budge the lug nuts with the tire iron,[4] Red introduces the Law of the Lever: "If it's not workin', leave 'er". After attempts with power tools also fail, Red resorts to "an innovative alternative to the normal tire-changing technology"—he uses duct tape to attach the spare tire to the flat tire. Then the punch line that makes the whole skit, he confidently asserts:
Of course now, this is only temporary … unless it works.
—Red Green
The scene ends with Red getting into the van and driving away—you can guess what happens to the spare tire.
This skit always reminds me of expedient engineering. In the heat of the moment—when there is a production issue, or it is crunch time on a project—an engineer may accept a short-term hack that they would otherwise view as completely unacceptable. However, this acceptance comes from believing that it is only a temporary hack that will soon be revisited. But just like with Red's tire, if the hack happens to work, it becomes out-of-sight and out-of-mind. Attention turns to more pressing issues—the temporary hack becomes semi-permanent and people, especially people who are not engineers, move on.
This kind of approach to engineering or operations is never sustainable. You will eventually end up with a Big Ball of Mud. Two problems result. The first is that continuous, expedient engineering ends up in a poorly structured system. This makes everything harder—it becomes hard to build your mental-model of the system, it becomes hard to evolve the system—and progress slowly grinds to a halt.
If engineers are always working around the periphery of a system, too afraid to make necessary changes to the core of that system, you are certainly on an unsustainable path. As Fred Books highlights in his software classic The Mythical Man Month, the conceptual integrity of the system will erode:
I will contend that conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas.
The second problem is that most engineers will not accept low-quality work. They will certainly make some concessions—as long as they are safe—to resolve a production issue or meet a project deadline, but once an engineer feels that the quality of their work is regularly below a certain standard, they will be looking for a new job. Bad engineering always crowds out the good.[5]
One of the reasons I developed my quality views technique was to encourage the entire organization to evaluate systems holistically and remind people of these temporary hacks and systems of poor overall quality—systems that carry latent risks, technically, operationally, and in terms of retaining engineers. As I wrote about in my article The Iceberg Secret Is Just the Tip of the Iceberg, for a component that appears to be working, a non-programmer will assume the technical debt is minimal, or nonexistent.
The Closing Sketch
There is a lot of truth in comedy. I am reminded of these three sketches again and again. They always give me a good laugh. The reason I find them funny, however, is the undercurrent of reality that runs through each one. Engineering is not always data-driven, efficient, or high-quality. There are human elements and organizational elements that lead to misinterpretations, meandering problem-solving, or temporary solutions that become permanent, because they happened to work. I hope that sharing these skits will give you a few good laughs and some valuable insights on engineers.
I wrote more about this project in my article Test Driven Career Development. ↩︎
I wrote about agency in my article The Importance of Agency. ↩︎
I think WD-40 is probably the handyman's secret weapon number two. ↩︎
It will not be lost on the astute viewer that he turns the lug nuts the wrong way the whole time, making them even tighter. ↩︎
See Joe Armstrong's talk Computer Science—A Guide for the Perplexed; Erik Dietrich's articles How To Keep Your Best Programmers and How Developers Stop Learning: Rise of the Expert Beginner; and Tim Lister's talk We're on a Mission From God: The Return of Peopleware. ↩︎