(Author’s Note – This is part of a series focused on job loss and is the first of three posts focused on identity. I wrote this post in December 2024.)
This past summer I suddenly and unexpectedly lost my job. I had been contemplating leaving this job for a while, but being asked to leave still came as a horrible shock. It was scary. Losing my salary and health benefits was scary. Losing my work social network was scary. But, by far and away the scariest part was the threat to how I saw myself — my identity.
Since that day, I have focused on how I define myself. I am trying to define myself in a way that is not dependent on my job and that is robust to changes in my job. While I have just started on this journey, I wanted to share some early learnings and experiences.
My plaque for reaching 10 years of employment at my last job. I lost that job a couple of months after receiving the plaque.
Backstory
For over 10 years I had worked at the same company. I had recently received a plaque from the company celebrating those 10 years, along with a nice gift, and the promise of an eight-week paid sabbatical. Over those 10 years, I had built myself into the performance person there. I knew more about testing the performance of our software than anyone else in the company. I was involved in designing and building all of our performance-testing infrastructure. Additionally, I was known in the performance engineering community as the performance guy at that company. I had published blogposts and papersabouthow we tested performance, and given many talks on the same subject.
Suddenly I wasn’t that person. I didn’t work there, so clearly I wasn’t the performance guy at that company. If I wasn’t that person, who was I? I was still the same husband, parent, friend, who loved to hike, cook, and read, but my professional identity was a large part of my total identity. It was gone and I felt incomplete and adrift without it.
I vowed to never let this happen again. I will no longer define myself (even just my professional self) in terms of my job. That starts by talking about my work, rather than about my job, as my work may be a job, but it may also be volunteering or a hobby. My work will be an expression of my identity, but will not be my identity itself. To meet that vow, I needed to develop a new self-definition. That is easier said than done. I’ve spent the last several months working on this challenge, and I expect to continue working on it for the rest of my life.
Learnings So Far
While I don’t have complete answers yet, I have learned a few things. I started this effort by seeing what I could learn from others: by talking to people and by reading. There’s a lot of literature around identity formation. While most of that literature focuses on adolescents, some, such as Designing Your Life, focuses on adults. The literature has recurring themes around identifying both your values and what you find rewarding. Experimentation can help confirm those learnings and test out potential identities.
I am doing that work. Thankfully, I have been doing some of that work for years, leading a veryreflectivelife. I’m continuing to do that while keeping these new ideas in my head.
Pillars of My New Identity
I’ve started by focusing on the parts of my identity disrupted by losing my job. I intend to extend this work to have a complete sense of my identity. So far, I know that at my best:
I make things that other people use.
I make the people around me better.
I’m a learner who pulls together disparate ideas to create cool things.
I write in service of the first three goals.
Through it all, I want the world to be a better place for me having been here. The specifics of how I make the world better will change over time, but I always want to be a positive force.
At my previous job, I built tools that others used to make our product faster, and our many customers used that product to do incredible things. At my next job I will help improve software that helps small and midsized businesses focus on their key business, rather than human resources processes. I have mentored people in the past and will continue to do so, both formally and informally. I listen and give them advice to help them live their best lives. Writing lets me help more people than I could by just talking to individuals. And my learning mindset helps me do a better job at all of this.
Into The Future
I’m starting a new job in a few days. I will dive in and make the best contribution that I can. I will make sure that I express those four pillars of my identity, ultimately making the world a better place. I will also continue to explore who I want to be and work to become that person. At some point, my new job may stop being the best place for me to express and grow that identity. When that happens, I will celebrate my personal progress and shared work experiences, I will find my next, next thing, and jump forward with both feet.
Along the way, I will write about my learnings and experiences so that we can continue to grow together. I encourage you to reflect on how you define your identity and to update it, especially if you define yourself in terms of your job. Additionally, if you have experiences or perspectives that would help me on my journey, I would love to hear from you. I always love learning, and learning from others is usually the fastest path forward for me.
Thank you to Heather Beasley Doyle for her feedback on this post and her support through this entire period of my life. Heather is a gifted writer. You should check out her homepage and her writing.
It was a beautiful Monday morning last summer as I sat down on a lounge chair to relax. I had lost my job five days earlier, but was in no rush to find the next job – or even to look. So I was taking some time off: My plan was to do whatever I wanted to do. I had spent the previous four days backpacking with friends on an already-planned (and wonderful) trip. This was the first day I would have been going to work if I still had my job. I did my morning meditation, solved the Wordle, and started flipping through the newspaper. Soon, I noticed a sinking feeling spreading through the pit of my stomach and a growing sense of dread.
This dread wasn’t about money or health care — I had planned for those things. It wasn’t about any specific need. Instead, it was about my complete freedom, and the lack of structure suddenly staring me in the face. I am a person of habit and routine whose routines were gone, and whose habits no longer felt relevant.
I waved goodbye to the smiling faces arrayed before me on my monitor and prepared to join my next call. It was late Wednesday morning, and these were my last two calls before starting a four-day weekend. I looked forward to finishing up some tasks, cleaning up loose ends, and heading out on my annual backpacking trip. The call had been a going-away for my friend Amy. Her last day would be Friday, but this call had been scheduled for Wednesday so I could attend. My next call was my regular one-on-one with my manager, also moved up because of my time off. Things had been strained between us, so I wasn't exactly looking forward to it, but I was looking forward to getting it over with and heading out on my vacation.
I clicked the link and the video call opened. Instead of one, there were two people on this call: my manager and someone I did not know. That could not be good. Was I being put on a performance improvement plan? My boss quickly introduced the second person — she was from HR — and then continued. "David, you are not meeting my performance expectations for a staff engineer. Further, I do not see a path for you to meet those expectations. Therefore we are letting you go, effective today.”
He told me our colleague from HR would walk me through the process.
“Do you have any questions for me?" he asked.
I did have questions. I had questions for him, for her, and later, I had many questions for myself.
That Wednesday was over a year ago. Many things changed for me in that moment, and I have spent a lot of time thinking about it since then. This post introduces a series of posts about my thinking, learning, and life over the past year. The first post is out now, with the remainder to follow.
Before continuing, I would like to briefly address “You are not meeting my performance expectations.” It’s true, I wasn’t meeting his expectations. It’s also true that I was good at my job. All my formal reviews had been good to great in my 10 years in that job. I usually received glowing reviews and was promoted twice. After being let go, I received many touching notes from colleagues who were shocked and upset that I had been let go. Many of the notes directly contradicted my manager’s story.
While it is scary to say publicly that I was fired, I think it is important to the story I want to tell. I also think sharing it may help others in similar tough times.
That said, this series is not about my dismissal per se, but rather my reaction to it and my continual attempts to make sense of it so I could move on and make the most of my life, experiencing and creating as much goodness (joy, wonder, love, satisfaction, …) as possible. This story starts well before my firing, and continues well afterwards. I have tried to intentionally live my life, regularly reflecting on what makes me happy and fills me with energy, and what does the reverse. Those reflections led me to both become a manager (lead engineer), and then eventually step away from that job. After transitioning from management to a staff engineer role, I continually tried to define my own job to be the best it could for myself and my company. Those efforts set the scene for this series.
This series starts with me getting the role I had long worked to develop and sell. Starting that role was scary, as I then needed to deliver in that high stakes role. In this first post, I talk about framing that pressure and the nervousness of having to deliver in that new position.
The series continues after my dismissal as I tried to give myself the space I needed to adjust. It turns out that too much free space can be terrifying, at least for me.
Losing my job was a large shock to my sense of identity, and I have been actively reshaping my sense of self. Several posts in the series cover my thinking and shaping of my identity:
Early in my time off, I started to grapple with the question of “Who am I?” I had let too much of my identity be tied up in my job. Losing my job disrupted how I thought of myself.
After spending a lot of time grappling with who I am, I realized I didn’t know who I was. At least I didn’t know how to tell the story of my personal journey over the past 10 years, including losing my job.
After continued reflection, I returned to who I wanted to be, as a unified human being. I didn’t want a professional identity and a personal identity. I wanted one unified identity, with my professional life being one expression of that identity. I expect this will be a continuing process for the rest of my life. I feel really good about my current framing of my identity.
Finally, the period of reflection covered by this series ends with me starting my next job and moving on to the next chapter of my life.
These posts, including the reflection, drafts, editing, and discussion that went into them, are part of the larger process I’ve been going through as I’ve made sense of this part of my life — and my life going forward. Writing them has been a healing, growing, and positive part of my past year. I share it all with you in the hope that it might help you or otherwise resonate with you. If any of it does help or resonate, I would love to hear from you about it.
Thank you to Heather Beasley Doyle for her feedback on this post and her support through this entire period of my life. Heather is a gifted writer. You should check out her homepage and her writing.
I was sitting on my mother's couch after Christmas, crossword puzzle in hand, when my stomach suddenly dropped. The puzzle hadn’t suddenly frightened me, nor had anything happened around me. Rather, my mind had jumped forward to the end of my vacation and returning to work. I liked my job. I had successfully changed it to better align with my interests and abilities over the previous year. What scared me was thinking about the stakes of those changes: I had spent a lot of political capital crafting that role for myself and I needed to deliver the expected results.
This post explores the expectations and anxiety of getting an opportunity you want and then trying to deliver on its promise. We never know how things will turn out or what challenges will arise. The most we can do is put ourselves in good situations with good chances for success, do our best, and then accept whatever comes — good or bad.
Desire for Change
In the summer of 2020, I left my role as a manager and shifted to a staff engineering role. I spent the next two years working with a new manager to rebuild the team, with him as its lead. We grew the team from two of us to 10, and all 10 of us continuously supported and learned from each other. I was proud of what we had built together, but with that done, I felt unsatisfied. I was not achieving the goals I had laid out for myself when I stepped away from being a manager:
Now I’m working to keep the parts of my role that did bring me energy, and remove the parts that did not. I absolutely love the impact I've had on how we test performance. I love all the things that we built, including our structure and processes. I love having insight into so many parts of the engineering organization and helping drive the big picture on performance testing. I love helping junior colleagues learn and grow, sharing learnings with them (so long as I don't also have to evaluate them).
Specifically, I wasn’t happy with my progress on having insight into so many parts of the engineering organization and helping drive the big picture on performance testing. So in late 2022, I set out to change my role once again.
Advocating for Change
I worked to define a new role that would make me happier. The role should have the insight and big picture impact I was missing. It should include reporting to the right level of management so I could have the reach, influence, and support to do the things I wanted. And it should include many of the things I knew filled me with energy, such as writing and sharing my learnings (blogs, papers, talks), reading and learning (research papers, blogs), and interacting with passionate people. Around this time I wrote a blog post on defining your own job. At the end of it I said:
I know what I want to do: I want to advance the state of the art of performance testing and software engineering at MongoDB, ideally through collecting, curating, and demonstrating the best ideas from the performance community and academia.
I started to pitch a role based on those things. A role in which I could interact with multiple teams, leverage the best the research community could provide us, and turn that into something real with big impact. I wrote a proposal capturing the role and talked to people about it – a lot of people.
There wasn't much appetite in my organization for a research-focused role. Everyone loved the results I had delivered using academic research, but we suffered from smaller, pressing problems that we needed to solve at that moment. Before we could invest in the larger and more interesting things, we needed to do the simpler things to solve problems today.
An Updated Proposal
I took the feedback, learned from it, and reflected. The perfect job should fit my needs and the company's needs. These conversations made the company’s needs clearer to me. I updated my role proposal to better align with both sets of needs: making our performance testing infrastructure stronger in the near term, while enabling greater things in the future. The proposal now focused on straightforward engineering work instead of research, and leaned more heavily into planning and coordination. It met my need for broad impact and the company's need for immediate results. The role would eventually enable the more advanced work I wanted to do.
I didn’t share the proposal at this point; instead, I started on the work it described. I teamed up with my product manager to write and submit a formal project proposal based on my role proposal. The project proposal laid out work for several teams for the next two years.
Living the Change
The project was approved, with me as its technical lead. With that, I shared my updated role proposal with key people. It now described a role I was already doing.
I found a sponsor. With their help I changed teams and started reporting higher in the org chart. I was in a better place to achieve my goals for myself and for my company! Success!
Scariness
This is when things got scary. I had invested a lot of time, effort, and political capital to build this role for myself. If I didn’t succeed in the role, it would all be for naught.
I could see what I needed to do to succeed. However, there were many things I could not control. If this new role didn’t work out, I couldn’t go back to my old role – I didn’t want to, and most likely it wouldn’t be available to me if I did. No other role would be a better fit for me within the company. And, having spent most of my political capital, I couldn’t expect much help changing roles again. Essentially, my only options were to succeed or to leave. I was operating without a safety net.
Lowering the Stakes
While the differences between success and failure were stark, I didn’t want to live in fear. I worked to reframe how I looked at the situation. We never know the future with certainty, so every choice we make is a gamble. I asked myself questions about this bet: Is this a good bet? Would I make this bet again knowing what I know now? Can I live with the consequences if I lose this bet?
As I sat on my mother's couch with my crossword puzzle, I answered these questions: Yes it is a good bet. Yes, I would make this bet again. Yes, I can live with the consequences of losing this bet.
I began to relax. The fear did not completely go away, but I could put it in perspective. Since then, I have worked hard to remember this perspective whenever that kind of fear comes back.
Similarly, I encourage you to choose your best opportunities. Whenever the future seems scary, reflect on whether you have given yourself your best chance for success and if you can accept the consequences if these chances do not pan out. If the answers are no, go make changes. If the answers are yes (I hope they are), try to keep that perspective and let go of your fear.
Thank you to Heather Beasley Doyle for her feedback on this post. I am a better writer and this is a better post due to her efforts. Heather is a gifted writer. You should check out her homepage and her writing.
Tomorrow (January 2nd, 2024) I start a new job. Five months ago I left my previous job with no idea of what would come next, beyond taking a long break to relax and recharge.
Me, at an early viewpoint on Wildcat Mountain towards the start of a backpacking trip and the beginning of my 5 months off. Photo courtesy of Tom Lehmann.
There are indviduals who seem to work slowly, but actually get things done very quickly. I call them Slow Thinkers. They are easily overlooked, but they do amazing things. We all lose out when we ignore them.
I’d like to introduce you to them and to their more visible cousins, whom I call Fast Thinkers. I think both groups are rare – few people can compare to them. I have great respect for both and have been blessed to work with multiple Fast and Slow Thinkers; Two such individuals particularly inspire the composite characters Claire and Ed described below. While I may picture those two, Claire and Ed could be anyone possessing their skills.
I do not put myself in either of their leagues, but my habits are more aligned with the Slow Thinker’s than the Fast Thinker’s. I’ve worked with enough Fast Thinkers to know deep in my bones that I cannot and will never be able to compete directly with them. If I have to work to the standard of the Fast Thinker, I cannot give my best work. I will make mistakes. I won’t be happy, my colleagues won’t be happy, and my boss won’t be happy. I suspect that is true for many of you as well.
Image created using OpenAI's DALL·E tool.
A Fast Thinker
Paul looks around the conference table at his assembled team. He calls the group to order, asks them, “How do we solve hard problem?” and provides some background on hard problem. Sally offers her opinion on what they should do, followed by Sam, both to mixed reactions. Then Claire says, “We should do good thing,” and explains what good thing is. She continues, “We should do it because of A, B, and C. It will address the worst of hard problem quickly, and then solve it completely in a couple of months.” Good thing includes some aspects of Sally’s and Sam’s suggestions, but it is all-around better. Claire is right – they should do good thing now. Everyone agrees to her plan, they do it, and they fix the problem.
Claire is a Fast Thinker. She is very smart, comes up with correct solutions frighteningly fast, can clearly explain those solutions, and knows she is right. It is both a pleasure and intimidating to work with a Fast Thinker. Everyone listens to Fast Thinkers and they can make short work of big problems. While they are always valuable, they are extra valuable during a crisis when every minute counts. Everyone can take comfort from their confident leadership in such situations.
A Slow Thinker
A few weeks later Paul calls the same group to order again. Paul presents new hard probem. The members of the team offer their thoughts once again. There are a lot of thoughts, including some from Claire. However, this time Claire isn’t confident of her proposal, recognizing limitations to her solution. There is a lot of discussion, but there are no strong conclusions. There are a couple of promising options, but each has significant trade-offs. Towards the end of the meeting, Paul turns to Ed, “Ed, what do you think?” Many people in the room are surprised because they had forgotten that Ed was in the room. He hasn’t said anything the entire meeting. Instead, he quietly listened as each person spoke.
He starts to speak. He speaks slower than everyone else has, with a contemplative tone. Ed restates new hard problem but in a slightly different and more interesting way. He’s not sure what should be done. He calls out some of the good points made by the others but then ties the problem to a completely new idea. Ed says he would like to take a couple of days to look into this new idea and see if there’s something there. He hedges that there might not be. Paul agrees to give Ed a few days and everyone leaves. When they return a few days later, Ed shares an incredible new idea. This new idea has grown from the seed of his original thoughts. Ed has grown this seed into something beautiful. It is the correct solution and it is amazing.
Ed is a Slow Thinker. Unlike Claire, Ed does not know the correct answer right away. He has thoughts and suspicions, but he needs to think through all those thoughts and suspicions before coming to a conclusion. Those who know Ed well, know he is brilliant – just as smart as Claire, but smart in a different way. Those who don’t, see a quiet, polite man. They overlook him. Thankfully, Paul knows Ed well and makes sure to regularly ask Ed what he is thinking. Importantly, Paul listens closely when Ed answers.
The Value of Speed
Speed is treasured in the tech world. The faster the company, the team, or the individual, the more chances they have to try things, to learn, and to win big. Fortunes are made by having a good idea and being faster than the competition. This is why so many books and phrases exalt speed.
However, we need to be careful how we measure speed. The time we take to do any given task is not important. What is important is the time to get something new (a product, a feature, or a fix) to market. The Fast Thinker appears to be much faster than the Slow Thinker because they do tasks faster. They have answers immediately, not in a few days.
When measured in time to delivered value (e.g., a new product …) and delivered value per time, I think the Slow Thinker is as fast as the Fast Thinker. Yes, they may take a little longer at the start of the process, but larger problems and innovations take weeks or months. A few days of contemplation at the start can make the remaining time much more productive and shorter.
Responsibility and Rewards
Everyone recognizes the brilliance of Fast Thinkers. People listen to Fast Thinkers and do what they say (and they should). The Fast Thinkers get promoted.
It’s easy to miss the brilliance and ultimate speed of Slow Thinkers. If Paul doesn’t specifically ask Ed what he thinks in that meeting, the world could easily miss out on a great idea. That would be a shame.
The success of Fast Thinkers can lead to the marginalization of Slow Thinkers. As Fast Thinkers are promoted, they take on more responsibility, such as deciding who they will promote and whose ideas they will listen to and implement. They know the value of their quick decisions, making it easy for them to value others’ quick decisions. It’s harder for them to see the Slow Thinkers’ value, because it is different than the value they provide as a Fast Thinkers.
Don’t Ignore The Slow Thinkers
It’s easy for Fast Thinkers to ignore Slow Thinkers. It’s easy for us mere mortals to do the same. Please don’t ignore the Slow Thinkers. They are a force waiting to be unleashed. If we ignore the Eds of the world, we lose out on their ideas and all that we could learn from them. So don’t just listen to the people who speak up forcibly. Listen also for the quieter voices. See which of those quieter voices provide wonderful and impactful ideas when given the space. Then bring them your hardest problems, give them some space, and listen to what they say. Then we can really go fast.
And if you identify with the Slow Thinkers, don’t sell yourself short. Don’t get competitive and try to be faster than the Fast Thinkers around you. Do give yourself the space and time to do your best work, make sure to work on important problems, and make sure others know about your great work.
Note: This was originally published on the MongoDB Engineering Blog on April 30, 2019 here by Henrik Ingo and myself. Please read it there assuming the link works. I have copied it here to ensure the content does not disappear. The links in the article are the original links.
On the MongoDB Performance team, we use EC2 to run daily system performance tests. After building a continuous integration system for performance testing, we realized that there were sources of random variation in our platform and system configuration which made a lot of our results non-reproducible. The run to run variation from the platform was bigger than the changes in MongoDB performance that we wanted to capture. To reduce such variation - environmental noise - from our test configuration, we set out on a project to measure and control for the EC2 environments on which we run our tests.
At the outset of the project there was a lot of doubt and uncertainty. Maybe using a public cloud for performance tests is a bad idea and we should just give up and buy more hardware to run them ourselves? We were open to that possibility, however we wanted to do our due diligence before taking on the cost and complexity of owning and managing our own test cluster.
Performance benchmarks in continuous integration
MongoDB uses a CI platform called Evergreen to run tests on incoming commits. We also use Evergreen for running multiple classes of daily performance tests. In this project we are focused on our highest level tests, meant to represent actual end-user performance. We call these tests System Performance tests.
For _System Performance_tests, we use EC2 to deploy real and relatively beefy clusters of c3.8xlarge nodes for various MongoDB clusters: standalone servers, 3 Node Replica Sets, and Sharded Clusters. These are intended to be representative of how customers run MongoDB. Using EC2 allows us to flexibly and efficiently deploy such large clusters as needed. Each MongoDB node in the cluster is run on its own EC2 node, and the workload is driven from another EC2 node.
Repeatability
There's an aspect of performance testing that is not obvious and often ignored. Most benchmarking blogs and reports are focusing on the maximum performance of a system, or whether it is faster than some competitor system. For our CI testing purposes, we primarily care about repeatability of the benchmarks. This means, the same set of tests for the same version of MongoDB on the same hardware should produce the same results whether run today or in a few months. We want to be able to detect small changes in performance due to our ongoing development of MongoDB. A customer might not get very upset about a 5% change in performance, but they will get upset about multiple 5% regressions adding up to a 20% regression.
The easiest way to avoid the large regressions is to identify and address the small regressions promptly as they happen, and stop the regressions getting to releases or release candidates. We do want to stress MongoDB with a heavy load, but, achieving some kind of maximum performance is completely secondary to this test suite’s goal of detecting changes.
For some of our tests, repeatability wasn't looking so good. In the below graph, each dot represents a daily build (spoiler -- you’ll see this graph again):
Variability in daily performance tests
Eyeballing the range from highest to lowest result, the difference is over 100,000 documents / second from day to day. Or, as a percentage, a 20-30% range.
Investigation
To reduce such variation from our test configuration, we set out on a project to reduce any environmental noise. Instead of focusing on the difference between daily MongoDB builds, we ran tests to focus on EC2 itself.
Process: Test and Analyze
Benchmarking is really an exercise of the basic scientific process:
Try to understand a real world phenomenon, such as an application that uses MongoDB
Create a model (aka benchmark) of that phenomenon (this may include setting a goal, like "more updates/sec")
Measure
Analyze and learn from the results
Repeat: do you get the same result when running the benchmark / measuring again?
Change one variable (based on analysis) and repeat from above
We applied this benchmarking process to evaluate the noise in our system. Our tests produce metrics measuring the average operations per second (ops/sec). Occasionally, we also record other values but generally we use ops/sec as our result.
To limit other variables, we locked the mongod binary to a stable release (3.4.1) and repeated each test 5 times on 5 different EC2 clusters, thus producing 25 data points.
We used this system to run repeated experiments. We started with the existing system and considered our assumptions to create a list of potential tests that could help us determine what to do to decrease the variability in the system. As long as we weren’t happy with the results we returned to this list and picked the most promising feature to test. We created focused tests to isolate the specific feature, run the tests and analyze our findings. Any workable solutions we found were then put into production.
For each test, we analyzed the 25 data points, with a goal of finding a configuration that minimizes this single metric:
range = (max - min) / median
Being able to state your goal as a single variable such as above is very powerful. Our project now becomes a straightforward optimization process of trying different configurations, in order to arrive at the minimum value for this variable. It's also useful that the metric is a percentage, rather than an absolute value. In practice, we wanted to be able to run all our tests so that the range would always stay below 10%.
Note that the metric we chose to focus on is more ambitious than, for example, focusing on reducing variance. Variance would help minimize the spread of most test results, while being fairly forgiving about one or two outliers. For our use case, an outlier represents a false regression alert, so we wanted to find a solution without any outliers at all, if possible.
Any experiment of this form has a tension between the accuracy of the statistics, and the expense (time and money) of running the trials. We would have loved to collect many more trials per cluster, and more distinct clusters per experiment giving us higher confidence in our results and enabling more advanced statistics. However, we also work for a company that needed the business impact of this project (lower noise) as soon as possible. We felt that the 5 trials per cluster times 5 clusters per experiment gave us sufficient data fidelity with a reasonable cost.
Assume nothing. Measure everything.
The experimental framework described above can be summarized in the credo of: Assume nothing. Measure everything.
In the spirit of intellectual honesty, we admit that we have not always followed the credo of Assume nothing. Measure everything, usually to our detriment. We definitely did not follow it when we initially built the System Performance test suite. We needed the test suite up as soon as possible (preferably yesterday). Instead of testing everything, we made a best effort to stitch together a useful system based on intuition and previous experience, and put it into production. It’s not unreasonable to throw things together quickly in time of need (or as a prototype). However, when you (or we) do so, you should check if the end results are meeting your needs, and take the results with a large grain of salt until thoroughly verified. Our system gave us results. Sometimes those results pointed us at useful things, and other times they sent us off on wild goose chases.
Existing Assumptions
We made a lot of assumptions when getting the first version of the System Performance test suite up and running. We will look into each of these in more detail later, but here is the list of assumptions that were built into the first version of our System Performance environment:
Assumptions:
A dedicated instance means more stable performance
Placement groups minimize network latency & variance
Different availability zones have different hardware
For write heavy tests, noise predominantly comes from disk
Ephemeral (SSD) disks have least variance
Remote EBS disks have unreliable performance
There are good and bad EC2 instances
In addition, the following suggestions were proposed as solutions to reducing noise in the system:
Just use i2 instances (better SSD) and be done with it
Migrate everything to Google Cloud
Run on prem -- you’ll never get acceptable results in the cloud
Results
After weeks of diligently executing the scientific process of hypothesize - measure - analyze - repeat we found a configuration where the range of variation when repeating the same test was less than 5%. Most of the configuration changes were normal Linux and hardware configurations that would be needed on on-premise hardware just the same as on EC2. We thus proved one of the biggest hypotheses wrong:
You can't use cloud for performance testing
With our first experiment, we found that there was no correlation between test runs and the EC2 instances they were run on. Please note that these results could be based on our usage of the instance type; you should measure your own systems to figure out the best configuration for your own system. You can read more about the specific experiment and its analysis in our blog post EC2 instances are neither good nor bad.
There are good and bad EC2 instances
After running the first baseline tests, we decided to investigate IO performance. Using EC2, we found that by using Provisioned IOPS we get a very stable rate of disk I/O per second. To us, it was surprising that ephemeral (SSD) disks were essentially the worst choice. After switching our production configuration from ephemeral SSD to EBS disks, the variation of our test results decreased dramatically. You can read more about our specific findings and how different instance types performed in our dedicated blog post EBS instances are the stable option.
Ephemeral (SSD) disks have least variance
Remote EBS disks have unreliable performance -> PIOPS
Just use i2 instances (better SSD) and be done with it (True in theory)
Next, we turned our attention to CPU tuning. We learned that disabling CPU options does not only stabilize CPU-bound performance results. In fact, noise in IO-heavy tests also seems to go down significantly with CPU tuning.
For write heavy tests, noise predominantly comes from disk
After we disabled CPU options, the variance in performance decreased again. In the below graph you can see how changing from SSD to EBS and disabling CPU options reduced the performance variability of our test suite. You can read more about the CPU options we tuned in our blog post Disable CPU options.
Improvements in daily performance measurements through changing to EBS and disabling CPU options
At the end of the project we hadn’t tested all of our original assumptions, but we had tested many of them. We still plan to test the remaining ones when time and priority allow:
A dedicated instance means more stable performance
Placement groups minimize network latency & variance
Different availability zones have different hardware
Through this process we also found that previously suggested solutions would not have solved our pains either:
Just use i2 instances (better SSD) and be done with it (True in theory)
Migrate everything to Google Cloud: Not tested!
Conclusion of the tests
In the end, there was still noise in the system, but we had reduced it sufficiently that our System Performance tests were now delivering real business value to the company. Every bit of noise bothers us, but at the end of the day we got to a level of repeatability in which test noise was no longer our most important performance related problem. As such, we stopped the all out effort on reducing system noise at this point.
Adding in safeguards
Before we fully moved on to other projects, we wanted to make sure to put up some safeguards for the future. We invested a lot of effort into reducing the noise, and we didn’t want to discover some day in the future that things had changed and our system was noisy again. Just like we want to detect changes in the performance of MongoDB software, we also want to detect changes in the reliability of our test platform.
As part of our experiments, we built several canary benchmarks which give us insights into EC2 performance itself based on non-MongoDB performance tests. We decided to keep these tests and run them as part of every Evergreen task, together with the actual MongoDB benchmark that the task is running. If a MongoDB benchmark shows a regression, we can check whether a similar regression can be seen in any of the canary benchmarks. If yes, then we can just rerun the task and check again. If not, it's probably an actual MongoDB regression.
If the canary benchmarks do show a performance drop, it is possible that the vendor may have deployed upgrades or configuration changes. Of course in the public cloud this can happen at arbitrary times, and possibly without the customers ever knowing. In our experience such changes are infrequently the cause for performance changes, but running a suite of "canary tests" gives us visibility into the day to day performance of the EC2 system components themselves, and thus increases confidence in our benchmark results.
The canary tests give us an indication of whether we can trust a given set of test results, and enables us to clean up our data. Most importantly, we no longer need to debate whether it is possible to run performance benchmarks in a public cloud because we measure EC2 itself!
Looking forward
This work was completed over 1.5 years ago. Since that time it has provided the foundation that all our subsequent and future work has been built upon. It has led to 3 major trends:
We use the results. Because we lowered the noise enough, we are able to regularly detect performance changes, diagnose them, and address them promptly. Additionally, developers are also "patch testing" their changes against System Performance now. That is, they are using System Performance to test the performance of their changes before they commit them, and address any performance changes before committing their code. Not only have we avoided regressions entering into our stable releases, in these cases we’ve avoided performance regressions ever making it into the code base (master branch).
We’ve added more tests. Since we find our performance tests more useful, we naturally want more such tests and we have been adding more to our system. In addition to our core performance team, the core database developers also have been steadily adding more tests. As our system became more reliable and therefore more useful, the motivation to create tests across the entire organization has increased. We now have the entire organization contributing to the performance coverage.
We’ve been able to extend the system. Given the value the company gets from the system, we’ve invested in extending the system. This includes adding more automation, new workload tools, and more logic for detecting when performance changes. None of that would have been feasible or worthwhile without lowering the noise of the System Performance tests to a reasonable level. We look forward to sharing more about these extensions in the future.
Coda: Spectre/Meltdown
As we came back from the 2018 New Years holidays, just like everyone else we got to read the news about the Meltdown and Spectre security vulnerabilities. Then, on January 4, all of our tests went red! Did someone make a bad commit into MongoDB, or is it possible that Amazon had deployed a security update with a performance impact? I turned out that one of our canary tests - the one sensitive to cpu and networking overhead - had caught the 30% drop too! Later, on Jan 13, performance recovered. Did Amazon undo the fixes? We believe so, but have not heard it confirmed.
Performance drops on January 4th and bounces back on January 13th
The single spike just before Jan 13 is a rerun of an old commit. This confirms the conclusion that the change in performance comes from the system, as running a Jan 11 build of MongoDB after Jan 13, will result in higher performance. Therefore the results depend on the date the test was run, rather than which commit was tested.
As the world was scrambling to assess the performance implications of the necessary fixes, we could just sit back and watch them in our graphs. Getting on top of EC2 performance variations has truly paid off.
Update: @msw pointed us to this security bulletin, confirming that indeed one of the Intel microcode updates were reverted on January 13.
Note: This was originally published on the MongoDB Engineering Blog on April 30, 2019 here by Henrik Ingo and myself. Please read it there assuming the link works. I have copied it here to ensure the content does not disappear. The links in the article are the original links.
In an effort to improve repeatability, the MongoDB Performance team set out to reduce noise on several performance test suites run on EC2 instances. At the beginning of the project, it was unclear whether our goal of running repeatable performance tests in a public cloud was achievable. Instead of debating the issue based on assumptions and beliefs, we decided to measure noise itself and see if we could make configuration changes to minimize it.
We already built up knowledge around fine-tuning CPU options when setting up another class of performance benchmarks (single node benchmarks). That work had shown us that CPU options could also have a large impact on performance. Additionally, it left us familiar with a number of knobs and options we could adjust.
Knob
Where to set
Setting
What it does
Idle Strategy
Kernel Boot
idle=poll
Puts linux into a loop when idle, checking for work.
Max sleep state (c4 only)
Kernel Boot
intel_idle.max_cstate=1 intel_pstate=disable
Disables the use of advanced processor sleep states.
CPU Frequency
Command Line
sudo cpupower frequency-set -d 2.901GHz
Sets a fixed frequency. Doesn't allow the CPU to vary the frequency for power saving.
Hyperthreading
Command Line
echo 0 > /sys/devices/system/ cpu/cpu$i/online
Disables hyperthreading. Hyperthreading allows two software threads of execution to share one physical CPU. They compete against each other for resources.
We added some CPU specific tests to measure CPU variability. These tests allow us to see if the CPU performance is noisy, independently of whether that noise makes MongoDB performance noisy. For our previous work on CPU options, we wrote some simple tests in our C++ harness that would, for example:
multiply numbers in a loop (cpu bound)
sleep 1 or 10 ms in a loop
Do nothing (no-op) in the basic test loop
We added these tests to our System Performance project. We were able to run the tests on the client only, and going across the network.
We ran our tests 5x5 times, changing one configuration at a time, and compared the results. The first two graphs below contain results for the CPU-focused benchmarks, the third contains the MongoDB-focused benchmarks. In all the below graphs, we are graphing the "noise" metric as a percentage computed from (max-min)/median and lower is better.
We start with our focused CPU tests, first on the client only, and then connecting to the server. We’ve omitted the sleep tests from the client graphs for readability, as they were essentially 0.
Results for CPU-focused benchmarks with different CPU options enabled
The nop test is the noisiest test all around, which is reasonable because it’s doing nothing in the inner loop. The cpu-bound loop is more interesting. It is low on noise for many cases, but has occasional spikes for each case, except for the case of the c3.8xlarge with all the controls on (pinned to one socket, hyperthreading off, no frequency scaling, idle=poll).
Results for tests run on server with different CPU options enabled
When we connect to an actual server, the tests become more realistic, but also introduce the network as a possible source of noise. In the cases in which we multiply numbers in a loop (cpuloop) or sleep in a loop (sleep), the final c3.8xlarge with all controls enabled is consistently among the lowest noise and doesn’t do badly on the ping case (no-op on the server). Do those results hold when we run our actual tests?
Results for tests run on server with different CPU options enabled
Yes, they do. The right-most blue bar is consistently around 5%, which is a great result! Perhaps unsurprisingly, this is the configuration where we used all of the tuning options: idle=poll, disabled hyperthreading and using only a single socket.
We continued to compare c4 and c3 instances against each other for these tests. We expected that with the c4 being a newer architecture and having more tuning options, it would achieve better results. But this was not the case, rather the c3.8xlarge continued to have the smallest range of noise. Another assumption that was wrong!
We expected that write heavy tests, such as batched inserts, would mostly benefit from the more stable IOPS on our new EBS disks, and the CPU tuning would mostly affect cpu-bound benchmarks such as map-reduce or index build. Turns out this was wrong too - for our write heavy tests, noise did not in fact predominantly come from disk.
The tuning available for CPUs has a huge effect on threads that are waiting or sleeping. The performance of threads that are actually running full speed is less affected - in those cases the CPU runs at full speed as well. Therefore, IO-heavy tests are affected a lot by CPU-tuning!
Disabling CPU options in production
Deploying these configurations into production made insert tests even more stable from day to day:
Improvements in daily performance measurements through changing to EBS and disabling CPU options
Note that the absolute performance of some tests actually dropped, because the number of available physical CPUs dropped by ½ due to only using a single socket, and disabling hyperthreading causes a further drop, though not quite a full half, of course.
Conclusion
Drawing upon prior knowledge, we decided to fine tune CPU options. We had previously assumed that IO-heavy tests would have a lot of noise coming from disk and that CPU tuning would mostly affect CPU-bound tests. As it turns out, the tuning available for CPUs actually has a huge effect on threads that are waiting or sleeping and therefore has a huge effect on IO-heavy tests. Through CPU tuning, we achieved very repeatable results. The overall measured performance in the tests decreases but this is less important to us. We care about stable, repeatable results more than maximum performance.