Friday, October 29, 2021

Questions on ICPE 2022 Data Challenge

 I am co-organizing the Data Challenge Track at ICPE-2022 with Cor-Paul Bezemer and Weiyi Shang. The data challenge dataset is a snapshot of MongoDBs performance testing results and analysis (see the CFP for more detail). Recently, a group of researchers reached out to me with some questions for more detail about the data. I wanted to share the answers to those questions more widely, in case they are helpful to others working on the data challenge. 


Q1: What configuration options are used? I mean, how large the configuration space is for MongoDB? Specifically, are all of these options considered in the measurements? https://docs.mongodb.com/manual/reference/configuration-options/

A1: We test a subset of the total configuration space. We have talked about some of the configuration in our dbtest.io paper: https://dl.acm.org/doi/abs/10.1145/3395032.3395323 and in the open source version of our testing infrastructure. Unfortunately that is a point in time snapshot -- we don't run our production system from the public repo.

At a high level there are a few dimensions:
  • MongoDB configuration
    • We test representative versions of the major configurations (standalone, replica set, sharded cluster). We also have some smaller versions of those (1-node replica set, and shard-lite). These have essentially the same configuration, but fewer nodes in the system.
    • We have some configurations with key features turned on and off (e.g., flow control, different security features).
    • We have some configurations specifically designed for testing initial sync. They add a node to the cluster during the test.
    • We also have tests that show up in "performance" project, as opposed to "sys-perf". These run on a single dedicated host, so are either standalone or a single node replica set.
  • Hardware configuration
    • Sys-perf runs predominantly on AWS c3.8xlarge instances with hyperthreading disabled, and processes pinned to a single socket.
    • We have another configuration using different instance types, and a matching MongoDB configuration. Those were tuned to be closer to our Atlas configuration, and are labelled "M60-like".
    • The "performance" project uses dedicated hardware.
All told, we currently run 28 distinct "variants" in our system. We would like to run a much larger slice of the configuration space. We continue to look into expanding the configuration space without blowing out the budget or overwhelming our ability to handle results.


Q2: How many configurations are measured?
A2: 28 in sys-perf, 4 in performance on our main branch. That may increase slightly when reviewing stable release branches (e.g., sys-perf-5.0).
 

Q3: Besides end-to-end metrics like throughout, what other performance events (performance traces, such as those we can measure using perf utility) are measured?
A3: As discussed in https://dl.acm.org/doi/10.1145/3427921.3450234 we have added latency measures (median, 90th percentile, 95th percentile) client side measurements. We also collect some system metrics across the runs (cpu usage, IO usage, ...). We haven't talked much about those, but they are hopefully fairly self explanatory. If you find something completely cryptic, reach out and I'll see if I can explain it. Not all of the metrics are always interesting (e.g., Worker's Max is a client side metric that I think should be constant across each timeseries).


Q4: What was the DBMS deployment topology for running the benchmarks (number of nodes, replication, sharding, client consistency settings)? Was always the same topology applied or different ones?
A4: Everything was deployed and run using DSI as described in https://dl.acm.org/doi/abs/10.1145/3395032.3395323 The configurations do change over time (slowly and infrequently), but they are version controlled so we can isolate exactly when they change. There are change points in the system associated with configuration changes and with workload changes.


Q5: What were the applied resource dimensions (VM, container, cores, memory, storage, OS,...)?
A5: The largest set are c3.8xlarge AWS instances. Those were picked a while ago as they were the lowest noise option. The c3.8xlarge is the largest of the c3 family. It's not bare-metal, but it appears to be unshared and provides access to low level configuration. 


Q6: What kind of workload has been applied (read/write ratio, complexity of queries, intensity, data set size, runtime) ?
A6: A lot of workloads have been run, from very focused, to much more general. Large classes include:
  • YCSB (load, 100% read, 50%read 50% update, ...) and a YCSB configuration with a much larger data set (YCSB_60GB)
  • Linkbench (we are running from a private fork, but it's pretty close)
  • py-tpcc
  • Genny workloads (this is our internal load generator. We run from this public repo).
  • Private javascript/mongo shell based workloads
  • Mongo-perf

Q7: Are the DBMS traces of the applied workload available, i.e. the data of the MongoDB profiler?
A7: No. We run all the above workloads directly, not by traces.