Welcome to another interview blog for the rapidly-approaching Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk at the conference and gives a short preview of what attendees can expect to learn from the presenter.
This blog post highlights Mat Arye, Core Database Engineer at Timescale. His talk is titled Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade-Off. Distributed systems were built to scale out for ballooning user bases and operations. As more and more companies vied to be the next Google, Amazon or Facebook, they too “required” horizontal scalability. But in a real way, NoSQL and even NewSQL have forgotten single node performance, where scaling out isn’t an option. And single node performance is important because it allows you to do more with much less. In our conversation, we discussed why you shouldn’t forget to focus on single-node performance:
Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?
Mat: My name is Mat Arye. I started working on database infrastructure as part of my graduate studies in distributed systems at Princeton University, with Timescale’s CTO Mike Freedman. My first project was developing the data streaming infrastructure for a cross-continental data analysis system called Jetstream. I was first introduced to working with PostgreSQL as an intern at CloudFlare, where I worked on their request-analysis system. I started working on the precursor to what would become TimescaleDB while working on a data analysis system for an IoT device cloud platform.
Note: Mike Freedman will also be speaking on Wednesday at 12:50 pm in Room M2, giving the talk TimescaleDB: Re-engineering PostgreSQL as a Time-Series Database.
Percona: Your talk is titled Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade-Off. How have people gotten away from single-node performance?
Mat: Well, when the Internet became a thing, people saw the deluge of data that was coming. They realized that a single-node data system would no longer suffice for many data applications. Thus, the focus of a lot of data infrastructure work shifted to creating scale-out systems. For multi-node systems, performance often comes from making the system “scale linearly” (i.e., increase performance by adding nodes). Thus, a “scalable” system meant it could scale-out across multiple servers. The performance of any single node became less important and less optimized. I do think that, as a community, we have now learned a lot about building scale-out systems and that we need to switch back to concentrating on single-node performance for reasons having to do with cost and operational efficiency.
Percona: How does single-node performance fit in with time-series data?
Mat: You can think of time-series data as “live” data. This data is often analyzed on dashboards and near-real-time analysis systems that have very different analysis latency requirements from the BI analytical use cases that data lakes were designed for. Single-node efficiency is important for creating systems that can provide the low-latency results necessary for these live applications. Also, many time-series data settings, especially for IoT related use cases, are remote or at the “edge” (e.g., mining sites, factory floors, satellites, gateways). Single-node performance is important for getting the most out of these smaller footprint or resource-constrained environments.
Percona: Why should people worry about single-node architecture in cloud deployments?
Mat: There are many applications in cloud deployments where the single-node data architecture that systems like TimescaleDB provides is sufficient for their data needs. In such applications, using a single-node cloud deployment can save costs (i.e., easier to use, easier to maintain, especially compared to smaller multi-node instances). It can also decrease the latency for getting query results compared to alternate multi-node systems.
Percona: Why should people attend your talk? What do you hope people will take away from it?
Mat: I hope that people learn two things: (1) that it is often possible (and desirable) to use efficient single-node data analysis systems for many important real-life applications, and (2) as a community, we should start concentrating on single-node efficiency even in multi-node systems. It sort of goes along with the whole “use the right tool for the job” approach that most people tend to aspire to.
Percona: What are you looking forward to at Percona Live (besides your talk)?
Mat: I always like learning about data analysis systems that take new approaches. The diversity of talks and topics at Percona always gives me the opportunity to learn something new. And of course, meeting new people is fun and educational, and Percona Live gives you a great opportunity for that!
Want to find out more about this Percona Live 2018 featured talk, and single-node database performance? Register for Percona Live 2018, and see Mat’s talk Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade-Off. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.
Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.
The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.