I was not surprised to read this article on Byte and Switch today. I have long thought that performance and Benchmark numbers were suspect in the storage and networking world.
Highlights of the article:
* Driven by a general sense that benchmarking practices in the areas of file and storage systems are lacking, we conducted an extensive survey of the benchmarks that were published in relevant conference papers in recent years. We decided to evaluate the evaluators, if you will. Our May 2008 ACM Transactions on Storage article, entitled “A Nine Year Study of File System and Storage Benchmarking'”, surveyed 415 file system and storage benchmarks from 106 papers that were published in four highly-regarded conferences (SOSP, OSDI, USENIX, and FAST) between 1999 and 2007.
Our suspicions were confirmed. We found that most popular benchmarks are flawed, and many research papers used poor benchmarking practices and did not provide a clear indication of the system’s true performance. We evaluated benchmarks qualitatively as well as quantitatively: we conducted a set of experiments to show how some widely used benchmarks can conceal or overemphasize overheads. Finally, we provided a set of guidelines that we hope will improve future performance evaluations. An updated version of the guidelines is available.
We believe that the current state of performance evaluations has much room for improvement. This belief is supported by the evidence presented in our survey. Computer Science is still a relatively young field, and the experimental evaluations need to move further in the direction of precise science. One part of the solution is that standards clearly need to be raised and defined. This will have to be done both by reviewers putting more emphasis on a system’s evaluation, and by researchers conducting experiments. Another part of the solution is that this information needs to be better disseminated to all. We hope that this article, as well as our continuing work, will help researchers and others to understand the problems that exist with file and storage system benchmarking. The final aspect of the solution to this problem is creating standardized benchmarks, or benchmarking suites, based on open discussion among file system and storage researchers.
Our article focused on benchmark results that are published in venues such as conferences and journals. Another aspect is standardized industrial benchmarks. Here, how the benchmark is run or chosen, or how the results are presented is of little interest, as these are all standardized. An interesting question, though, is how effective these benchmarks are, and how the standards shape the products that are being sold today (for better or worse).
The goal of this project is to raise awareness of issues relating to proper benchmarking practices of file and storage systems. We hope that with greater awareness, standards will be raised, and more rigorous and scientific evaluations will be performed and published. Since this article was published in May 2008, we held a workshop on storage benchmarking at UCSC, and we presented a BoF session at the 2009 7th USENIX Conference on File and Storage Technologies (FAST). We have also set up a mailing list for information on future events, as well as discussions. More information can be found on our Website, http://fsbench.filesystems.org/.