Friday, June 14, 2024

Repeatable Performance Tests: EBS Instances are the Stable Option

 Note: This was originally published on the MongoDB Engineering Blog on April 30, 2019 here by Henrik Ingo and myself. Please read it there assuming the link works. I have copied it here to ensure the content does not disappear. The links in the article are the original links.

In an effort to improve repeatability, the MongoDB Performance team set out to reduce noise on several performance test suites run on EC2 instances. At the beginning of the project, it was unclear whether our goal of running repeatable performance tests in a public cloud was achievable. Instead of debating the issue based on assumptions and beliefs, we decided to measure noise itself and see if we could make configuration changes to minimize it.

After thinking about our assumptions and the experiment setup, we began by recording data about our current setup and found no evidence of particularly good or bad EC2 instances. However, we found that the results of repeated tests had a high variance. Given our test data and our knowledge of the production system, we had observed that many of the noisiest tests did the most IO (being either sensitive to IO latency, or bandwidth). After performing the first baseline tests, we therefore decided to focus on IO performance through testing both AWS instance types and IO configuration on those instances.

Investigate IO

As we are explicitly focusing on IO in this step, we added an IO specific test (fio) to our system. This allows us to isolate the impact of IO noise to our existing benchmarks. The IO specific tests focus on:

  • Operation latency

  • Streaming bandwidth

  • Random IOPs (IO per second)

We look first at the IO specific results, and then our general MongoDB benchmarks. In the below graph, we are graphing the "noise" metric as a percentage computed from (max-min)/median and lower is better. c3.8xlarge with ephemeral storage is our baseline configuration which we were using in our production environment.

i2.8xlarge shows best results with low noise on throughput and latency

The IO tests show some very interesting results.

  1. The c3.8xlarge with EBS PIOPS shows less noise than the c3.8xlarge with its ephemeral disks. This was quite unexpected. In fact the c3.8xlarge with ephemeral storage (our existing configuration) is just about the worst choice.

  2. The i2.8xlarge looks best all around with low noise on throughput and latency.

  3. The c4.8xlarge shows higher latency noise than the c3.8xlarge. We would have expected any difference to favor the c4.8xlarge instances, as they are EBS optimized.

After these promising results, we examined the results of our MongoDB benchmarks next. At the time that we did this work, MongoDB had two storage engines (wiredTiger and MMAPv1), with MMAPv1 being the default, but now deprecated, option. There were differences in the results between the two storage engines, but they shared a common trend.

c3.8xlarge with PIOPS performs best with all results below 10% noise for the wiredTiger storage engine

c3.8xlarge with PIOPS performs best with most results below 10% noise for the mmap storage engine

There were no configurations that were best across the board. That said, there was a configuration with below 10% noise for all but 1 test: c3.8xlarge with EBS PIOPS. Interestingly, while i2 was the best for our focused IO tests, it was not for our actual tests.

Valuable lessons learned:

  • As far as repeatable results are concerned, the "local" SSDs we had been using performed worse compared to any other alternative we could have possibly chosen!

  • Contrary to popular belief, when using Provisioned IOPS with EBS, the performance is both good in absolute terms, and very very stable! This is true for our IO tests and our general tests. The latency of disk requests does have more variability than the SSD alternatives, but the IOPS performance was super stable. For most of our tests, the latter is the important characteristic.

  • The i2 instance family has a much higher performance SSD, and in fio tests showed almost zero variability. It also happens to be a very expensive instance type. However, while this instance type was indeed a great choice in theory, it turns out that our MongoDB test results were quite noisy. Upon further investigation, we learned that the noisy results were due to unstable performance of MongoDB itself. As i2.8xlarge has more RAM than c3.8xlarge, MongoDB on i2.8xlarge is able to hold much more dirty data in RAM. Flushing that much dirty data to disk was causing issues.

Switching from ephemeral to EBS disks in production

Based on the above results, we changed our production configuration to run on EBS disks instead of ephemeral SSD. (We were already running on c3.8xlarge instance types, which turned out to have the lowest noise in the above comparison, so decided to keep using those.)

Performance becomes more stable when using EBS

After running with the changes for a couple of weeks, you could clearly see how the day-to-day variation of test results decreased dramatically. This instantly made the entire System Performance project more useful to the development team and MongoDB as a whole.


Focusing on IO performance proved useful. As it turns out using Ephemeral (SSD) disks was just about the worst choice for our performance test. Instead, using Provisioned IOPS showed the most stable rate results. While i2 instances were the best in our non-MongoDB benchmark tests, they proved less than ideal in practice. This highlights quite clearly that you need to measure your actual system and assume nothing to get the best results.

This is the second of three bigger experiments we performed in our quest to reduce variability in performance tests on EC2 instances. You can read more about the top level setup and results as well as how we found out that EC2 instances are neither good nor bad and that CPU options are best disabled.

Monday, June 3, 2024

Reading to Answer Questions

Putting down my book, I stood up from the bench and walked over to my 10-year-old son by the water. His fishing line was tangled. Again. This was the third time he had needed me to fix something. He hadn't caught a single fish and was getting very upset. He loved the idea of fishing and had visions of catching impressive fish like those caught in the videos he watched. Sadly, he was not good at fishing and neither was I.

I recently wrote about how to read academic papers in volume. When I researched how to read academic papers, I also learned how to read more effectively in general. Those skills have helped me better achieve my goals, such as helping my son catch fish.

This post covers how I use books to answer my bigger questions, and is largely based on ideas from the book How to Read a Book by Mortimer Adler.
10 year old boy with glasses standing at the edge of a pond, holding a fishing rod in his right hand, and holding up a small fish from a line with his left hand.  To one side of him are reeds and cattails. To the other is water and lily pads. Behind him is brush growing at the edge of the lake.  Further behind the pond is a small road.
My son holding a small fish he caught.

Learning Through Books

To help my son catch fish, I had to learn both facts and skills. I needed to learn about a subject (fishing) and a skill (how to catch fish). It was one of my larger questions, requiring time and effort. I’ve investigated other larger questions focused on work problems such as working with a challenging colleague, personal goals such as remembering things better, and personal challenges such as supporting my family with a medical issue. The best way to learn facts and skills quickly is to learn from those who have already learned the topic. Books are a treasure trove of other people’s learnings.

I follow three main steps on a learning project:

  1. Find many potential resources

  2. Filter those results to a manageable number of the best resouces

  3. Extract what I need from those resources. 

In my life I am constantly asking questions and searching for answers. Only a few of those questions merit the time and effort required for this process. For those that do, I adjust my effort to my question and stop when I have what I need.

Find Many Potential Resources

I start with a wide search using the internet and my local library. I also ask friends, colleagues, and social media for suggestions. When I do this right, I turn up a lot of resources. For each resource, I’ll quickly check what I can about it, such as summaries and reviews, to see whether I think this book might be on topic. Key word searches often turn up things that are clearly unrelated to my question. For example, Incredible--and True!--Fishing Stories may or may not be entertaining, but it’s not going to help my son catch fish. I can immediately reject that book.
My search extends beyond books to include articles, blogs, podcasts, and videos. Books tend to be higher quality and I have a personal bias to the written word, but I want the best available resources regardless of its medium. 

Then I do a second search based on the results from the first search. Did I find other search terms? What other resources are related to these? Do any of these books reference other resources?
Filter For The Best Resources

Now I have a large list of books and other resources. I request as many of the books as I can from my local library. I download or bookmark the online resources. Then I scan all of these together quickly so that I can think about the ideas from multiple books at the same time. Speed is essential for that mixing of ideas. For each book I want to know:
  1. What is this book about?
  2. How does it address my question? What techniques does it propose? 
  3. What words does it use for my question? What do they mean? 
  4. Do I believe the answers this book proposes? Alternatively, is my bullshit detector going off? 

I quickly inspect each book, scanning the table of contents, reading the publisher's blurb, skimming the end of the last chapter, and checking for summaries at the end of key chapters. I take notes on anything that addresses my prompts, anything that catches my attention, and any new questions that arise as I read. By the time I’m done, the books are usually covered in sticky notes.

I look for common words and ideas, and I usually see patterns across the books. Sometimes several of the books have the same "revolutionary" or "game-changing" ideas. This used to annoy me, but I’ve since learned this is a great result. This common “revolutionary” idea is likely what I'm looking for. Other times I learn from overlapping words and ideas in the books, even if the books don't agree on everything. When they disagree I can see which ideas stand up to criticism and which fall over.

At this point, I have two follow-up questions:

  1. Does one book capture a common idea better than the others? 
  2. Are there interesting differences between the books? 

If one book does capture the key idea better than the others, I focus on that one. When there are interesting differences, I learn more from reading the books together than by reading them separately, just as I learned more from scanning them together.

Extract Answers 

I now have at least one book to read deeply. First, I revisit my existing notes to answer questions: What do I expect to get out of each book? What is common between them? What words do they use for important items? Where do they differ?
From these answers I form focused questions for each book: Book A says X, what does Book B say about that? Book C and D seem to differ on point C. Is one of them right? Is there something more complicated going on? Book E claims point D. What’s the evidence for it?

Next, I read each book, trying to answer my questions, still taking copious notes. I listen for the conversations among the books. The differences among books may be true disagreements requiring real thought and evaluation on my part. Or they may merely be different framings of the same idea. The common themes become clearer, and the differences enlighten me. In both cases I end up with a deeper understanding of the underlying issue.
When I’ve finished reading everything, I review and organize my notes. I often write a summary for myself. I hopefully have answers to my questions. If I don’t, I restart the process by looking for more sources focused on whatever is missing. If I’m really excited about the result, I may write a blog post or share my learnings with others.

Catching Fish

That day that I took my son fishing, I asked him two questions when we were back home: Would you like my help figuring out how to catch fish? Yes, he did. I told him if he wanted my help, we had to use the goal of "catch fish — any kind of fish." We couldn't only aim for the big fish he'd seen in videos. Would that still be worthwhile? He thought briefly: Yes, it would be.
I then got every book about fishing I could from my local library. I scanned each of them and noticed trends. It was clear that we should start by targetting small fish such as sunfish, with small hooks and bobbers, and we should fish in smaller bodies of water. I bought a couple of the more promising books, as well as the appropriate fishing gear, including the small hooks and bobbers. We then went to a local pond and he caught fish! That summer he caught a lot of fish. It was wonderful for him, and wonderful for me to help my son succeed at something that he loved so much.
I hope this post helps you achieve similar success with your large questions. I would love to hear about it when it does.