Author: Planet MySQL

Netflix Data Benchmark: Benchmarking Cloud Data Stores

The Netflix member experience is offered to 83+ million global members, and delivered using thousands of microservices. These services are owned by multiple teams, each having their own build and release lifecycles, generating a variety of data that is stored in different types of data store systems. The Cloud Database Engineering (CDE) team manages those data store systems, so we run benchmarks to validate updates to these systems, perform capacity planning, and test our cloud instances with multiple workloads and under different failure scenarios. We were also interested in a tool that could evaluate and compare new data store systems as they appear in the market or in the open source domain, determine their performance characteristics and limitations, and gauge whether they could be used in production for relevant use cases. For these purposes, we wrote Netflix Data Benchmark (NDBench), a pluggable cloud-enabled benchmarking tool that can be used across any data store system. NDBench provides plugin support for the major data store systems that we use — Cassandra (Thrift and CQL), Dynomite (Redis), and Elasticsearch. It can also be extended to other client APIs. IntroductionAs Netflix runs thousands of microservices, we are not always aware of the traffic that bundled microservices may generate on our backend systems. Understanding the performance implications of new microservices on our backend systems was also a difficult task. We needed a framework that could assist us in determining the behavior of our data store systems under various workloads, maintenance operations and instance types. We wanted to be mindful of provisioning our clusters, scaling them either horizontally (by adding nodes) or vertically (by upgrading the instance types), and operating under different workloads and conditions, such as node failures, network partitions, etc. As new data store systems appear in the market, they tend to report performance numbers for the “sweet spot”, and are usually based on optimized hardware and benchmark configurations. Being a cloud-native database team, we want to make sure that our systems can provide high availability under multiple failure scenarios, and that we are utilizing our instance resources optimally. There are many other factors that affect the performance of a database deployed in the cloud, such as instance types, workload patterns, and types of deployments (island vs global). NDBench aids in simulating the performance benchmark by mimicking several production use cases.There were also some additional requirements; for example, as we upgrade our data store systems (such as Cassandra upgrades) we wanted to test the systems prior to deploying them in production. For systems that we develop in-house, such as Dynomite, we wanted to automate the functional test pipelines, understand the performance of Dynomite under various conditions, and under different storage engines. Hence, we wanted a workload generator that could be integrated into our pipelines prior to promoting an AWS AMI to a production-ready AMI.We looked into various benchmark tools as well as REST-based performance tools. While some tools covered a subset of our requirements, we were interested in a tool that could achieve the following:Dynamically change the benchmark configurations while the test is running, hence perform tests along with our production microservices.Be able to integrate with platform cloud services such as dynamic configurations, discovery, metrics, etc.Run for an infinite duration in order to introduce failure scenarios and test long running maintenances such as database repairs.Provide pluggable patterns and loads.Support different client APIs.Deploy, manage and monitor multiple instances from a single entry point.For these reasons, we created Netflix Data Benchmark (NDBench). We incorporated NDBench into the Netflix Open Source ecosystem by integrating it with components such as Archaius for configuration, Spectator for metrics, and Eureka for discovery service. However, we designed NDBench so that these libraries are injected, allowing the tool to be ported to other cloud environments, run locally, and at the same time satisfy our Netflix OSS ecosystem users.NDBench ArchitectureThe following diagram shows the architecture of NDBench. The framework consists of three components:Core: The workload generatorAPI: Allowing multiple plugins to be developed against NDBenchWeb: The UI and the servlet context listenerWe currently provide the following client plugins — Datastax Java Driver (CQL), C* Astyanax (Thrift), Elasticsearch API, and Dyno (Jedis support). Additional plugins can be added, or a user can use dynamic scripts in Groovy to add new workloads. Each driver is just an implementation of the Driver plugin interface. NDBench-core is the core component of NDBench, where one can further tune workload settings. Fig. 1: NDBench ArchitectureNDBench can be used from either the command line (using REST calls), or from a web-based user interface (UI).NDBench Runner UIFig.2: NDBench Runner UIA screenshot of the NDBench Runner (Web UI) is shown in Figure 2. Through this UI, a user can select a cluster, connect a driver, modify settings, set a load testing pattern (random or sliding window), and finally run the load tests. Selecting an instance while a load test is running also enables the user to view live-updating statistics, such as read/write latencies, requests per second, cache hits vs. misses, and more.Load PropertiesNDBench provides a variety of input parameters that are loaded dynamically and can dynamically change during the workload test. The following parameters can be configured on a per node basis:numKeys: the sample space for the randomly generated keysnumValues: the sample space for the generated valuesdataSize: the size of each valuenumWriters/numReaders: the number of threads per NDBench node for writes/readswriteEnabled/readEnabled: boolean to enable or disable writes or readswriteRateLimit/readRateLimit: the number of writes per second and reads per secondsuserVariableDataSize: boolean to enable or disable the ability of the payload to be randomly generated.Types of Workload NDBench offers pluggable load tests. Currently it offers two modes — random traffic and sliding window traffic. The sliding window test is a more sophisticated test that can concurrently exercise data that is repetitive inside the window, thereby providing a combination of temporally local data and spatially local data. This test is important as we want to exercise both the caching layer provided by the data store system, as well as the disk’s IOPS (Input/Output Operations Per Second).Load GenerationLoad can be generated individually for each node on the application side, or all nodes can generate reads and writes simultaneously. Moreover, NDBench provides the ability to use the “backfill” feature in order to start the workload with hot data. This helps in reducing the ramp up time of the benchmark.NDBench at NetflixNDBench has been widely used inside Netflix. In the following sections, we talk about some use cases in which NDBench has proven to be a useful tool.Benchmarking ToolA couple of months ago, we finished the Cassandra migration from version 2.0 to 2.1. Prior to starting the process, it was imperative for us to understand the performance gains that we would achieve, as well as the performance hit we would incur during the rolling upgrade of our Cassandra instances. Figures 3 and 4 below illustrate  the p99 and p95 read latency differences using NDBench. In Fig. 3, we highlight the differences between Cassandra 2.0 (blue line) vs 2.1 (brown line).Fig.3: Capturing OPS and latency percentiles of CassandraLast year, we also migrated all our Cassandra instances from the older Red Hat 5.10 OS to Ubuntu 14.04 (Trusty Tahr). We used NDBench to measure performance under the newer operating system. In Figure 4, we showcase the three phases of the migration process by using NDBench’s long-running benchmark capability. We used rolling terminations of the Cassandra instances to update the AMIs with the new OS, and NDBench to verify that there would be no client-side impact during the migration. NDBench also allowed us to validate that the performance of the new OS was better after the migration.Fig.4: Performance improvement from our upgrade from Red Hat 5.10 to Ubuntu 14.04AMI Certification ProcessNDBench is also part of our AMI certification process, which consists of integration tests and deployment validation. We designed pipelines in Spinnaker and integrated NDBench into them. The following figure shows the bakery-to-release lifecycle. We initially bake an AMI with Cassandra, create a Cassandra cluster, create an NDBench cluster, configure it, and run a performance test. We finally review the results, and make the decision on whether to promote an “Experimental” AMI to a “Candidate”. We use similar pipelines for Dynomite, testing out the replication functionalities with different client-side APIs. Passing the NDBench performance tests means that the AMI is ready to be used in the production environment. Similar pipelines are used across the board for other data store systems at Netflix. Fig.5 NDBench integrated with Spinnaker pipelinesIn the past, we’ve published benchmarks of Dynomite with Redis as a storage engine leveraging NDBench. In Fig. 6 we show some of the higher percentile latencies we derived from Dynomite leveraging NDBench. Fig.6: P99 latencies for Dynomite with consistency set to DC_QUORUM with NDBenchNDBench allows us to run infinite horizon tests to identify potential memory leaks from long running processes that we develop or use in-house. At the same time, in our integration tests we introduce failure conditions, change the underlying variables of our systems, introduce CPU intensive operations (like repair/reconciliation), and determine the optimal performance based on the application requirements. Finally, our sidecars such as Priam, Dynomite-manager and Raigad perform various activities, such as multi-threaded backups to object storage systems. We want to make sure, through integration tests, that the performance of our data store systems is not affected. ConclusionFor the last few years, NDBench has been a widely-used tool for functional, integration, and performance testing, as well as AMI validation. The ability to change the workload patterns during a test, support for different client APIs, and integration with our cloud deployments has greatly helped us in validating our data store systems. There are a number of improvements we would like to make to NDBench, both for increased usability and supporting additional features. Some of the features that we would like to work on include:Performance profile managementAutomated canary analysisDynamic load generation based on destination schemasNDBench has proven to be extremely useful for us on the Cloud Database Engineering team at Netflix, and we are happy to have the opportunity to share that value. Therefore, we are releasing NDBench as an open source project, and are looking forward to receiving feedback, ideas, and contributions from the open source community. You can find NDBench on Github at: https://github.com/Netflix/ndbenchIf you enjoy the challenges of building distributed systems and are interested in working with the Cloud Database Engineering team in solving next-generation data store problems, check out our job openings.Authors: Vinay Chella, Ioannis Papapanagiotou, and Kunal Kundaje

Automated testing on devices

As part of the Netflix SDK team, our responsibility is to ensure the new release version of the Netflix application is thoroughly tested to its highest operational quality before deploying onto gaming consoles and distributing as an SDK (along with a reference application) to Netflix device partners; eventually making its way to millions of smart TV’s and set top boxes (STB’s). Overall, our testing is responsible for the quality of Netflix running on millions of gaming consoles and internet connected TV’s/STB’s.Unlike software releases on the server side, the unique challenge with releases on devices is that there can be no red/black pushes or immediate rollbacks in case of failure. If there is a bug in the client, the cost of fixing the issue after the code has been shipped on the client device is quite high. Netflix has to re-engage with various partners whose devices might already have been certified for Netflix, kicking off the cycle again to re-certify the devices once the fix has been applied, costing engineering time both externally and internally. All the while, customers might not have a workaround to the problem, hence exposing them to suboptimal Netflix experience. The most obvious way to avoid this problem is to ensure tests are conducted on devices in order to detect application regressions well before the release is shipped.This is a first part on a series of posts to describe key concepts and infrastructure we use to automate functional, performance, and stress testing the Netflix SDK on a number of devices.Aspirational GoalsOver the years, our experience with testing the Netflix application using both manual and automated means taught us several lessons. So when the time came to redesign our automation system to go to the next level and scale up we made sure to set them as core goals.Low setup cost / High test “agility”Tests should not be harder to create and/or use when automation is used. In particular tests that are simple to run manually should stay simple to run in the automation. This means that using automation should have close to zero setup cost (if not none). This is important to make sure that creating new tests and debugging existing ones is both fast and painless. This also ensures the focus stays on the test and features in test as long as possible.No test structure constraintUsing an automation system should not constrain tests to be written in a particular format. This is important in order to allow future innovation in how tests are written. Furthermore different teams (we interact with teams responsible for platform, security, playback/media/ UI, etc) might come up with different ways to structure their tests in order to better suit their needs. Making sure the automation system is decoupled from the test structure increase its reusability.Few layers at the test levelWhen building a large scale system, it is easy to end up with too many layers of abstraction. While this isn’t inherently bad in many cases, it becomes an issue when those layers are also added in the tests themselves in order to allow them to integrate with automation. Indeed the further away you are from the feature you actually test, the harder it is to debug when issues arise: so many more things outside of the application under test could have gone wrong.In our case we test Netflix on devices, so we want to make sure that the tests run on the device itself calling to functions as close as possible to the SDK features being tested.Support important device featuresDevice management consumes a lot of time when done manually and therefore is a big part of a good automation system. Since we test a product that is being developed, we need the ability to change builds on the fly and deploy them to devices. Extracting log files and crash dumps is also very important to automate in order to streamline the process of debugging test failure.Designing automationWith these goals in place, it was clear that our team needed a system providing the necessary automation and device services while at the same time staying out of the way of testing as much as possible.This required rethinking existing frameworks and creating a new kind of automation ecosystem. In order for automation to provide that flexibility, we needed the automation system to be lean, modular and require external services only when absolutely needed for testing a feature, that is to say only if the functionality cannot be done directly from the application on the device (for example suspend the application or manipulate the network).Reducing the use of external services to the strict minimum has a few benefits:It ensures that the logic about the test resides within the test itself as much as possible. This improves readability, maintenance and debuggability of the test. Most tests end up having no external dependencies allowing developers trying to reproduce a bug to run the test with absolutely no setup using the tools they are used to. The test case author can focus on testing the functionality of the device without worrying about external constraints.At the simplest level, we needed to have two separate entities:Test Framework A software abstraction helping the writing of test cases by exposing functions taking care of the test flow of control.A test framework is about helping writing tests and should be as close as possible to the device/application been tested in order to reduce the moving parts needed to be checked when debugging a test failure.There could be many of them so that different teams can structure their tests in a way that matches their needs.Automation Services A set of external backend services helping with the management of devices, automating execution of tests and when absolutely required providing external features for testing. Automation services should be built in the most standalone manner as possible. Reducing ties between services allows for better reusability, maintenance, debugging and evolution. For example services which aid in starting the test, collecting information about the test run, validating test results can be delegated to individual microservices. These microservices aid in running the test independently and are not required to run a test. Automation service should only provide service and should not control the test flow. For instance, the test can ask an external service to restart the device as part of test flow. But the service should not be dictating the test to restart the device and control test flow.Building a Plug and Play EcosystemWhen it came to designing automation services, we looked at what was needed from each of these services.Device ManagementWhile the tests themselves are automated, conducting tests on a wide range of devices requires a number of custom steps such as flashing, upgrading, and launching the application before the test starts as well as collecting logs and crash dumps after the test ends. Each of these operations can be completely different on each device. We needed a service abstracting the device specific information and providing a common interface for different devicesTest ManagementWriting tests is only a small part of the story: the following must also be taken care of: – Organizing them in groups (test suites) – Choosing when to run them – Choosing what configuration to run them with – Storing their results – Visualizing their resultsNetwork ManipulationTesting the Netflix application experience on a device with fluctuating bandwidth is a core requirement for ensuring high quality uninterrupted playback experience. We needed a service which could change network conditions including traffic shaping and DNS manipulation.File ServiceAs we start collecting builds for archival purpose or for storing huge log files, we needed a way to store and retrieve these files and file service was implemented to assist with this.Test RunnerEach service being fully independent we needed an orchestrator that would talk to the separate services in order to get and prepare devices before tests are run and collecting results after the tests ends.With the above mentioned design choices in mind, we built the following automation system.The services described below evolved to meet the above specified needs with the principles of being as standalone as possible and not tied into the testing framework. These concepts were put in practice as described below.Device serviceThe device service abstracts the technical details required to manage a device from start to end.  By exposing a simple unified RESTful interface for all type of devices, consumers of this service no longer need to have any device specific knowledge: they can use all and any devices as if they were the same.The logic of managing each type of devices in not directly implemented on the device service itself but instead delegated to other independent micro-services called device handlers.This brings flexibility is adding support of new type of devices since device handlers can be written in any programing language using their own choice of REST APIs and existing handlers can easily be integrated with the device service. Some handlers can also sometimes require a physical connection to the device therefore decoupling the device service from the device handlers gives flexibility in where to locate them.For each request received, the role of the device service is to figure out which device handler to contact and proxy the request to it after having adapted it to the set of REST API the device handler interfaces with.Let us look at a more concrete example of this… The action for installing a build on PS4 for example is very different than installing a build on Roku. One relies on code written in C# interfacing with ProDG Target Manager running on Windows (for PlayStation) and the other written in Node.js running on Linux. The PS4 and Roku device handlers both implement their own device specific installation procedure. If the device service needs to talk to a device, it needs to know the device specific information. Each device, with its own unique identifier is stored and accessible by the device service as a device map object, containing information regarding the device needed by the handler. For example:Device IP or hostname Device Mac address (optional) Handler IP or hostname Handler Port Bifrost IP or hostname (Network service) Powercycle IP or hostname (remote power management service) The device map information is populated when adding device into our automation for the first time.When a new device type is introduced for testing, a specific handler for that device is implemented and exposed by the device service. The device service supports the following common set of device methods:POST /device/installInstalls the Netflix applicationPOST /device/startLaunches the Netflix application with a given set of launch parametersPOST /device/stopStops the Netflix applicationPOST /device/restartRestarts the Netflix application (stop + start essentially)POST /device/powercyclePower-cycles the device. Either via direct or remote power boot.GET /device/statusRetrieves information about the device (ex: running, stopped, etc…)GET /device/crashCollects the Netflix application crash reportGET /device/screenshotGrabs a full screen render of the active screenGET /device/debugCollects debug files produced by the deviceNote that each of these endpoints require a unique device identifier to be posted to the request. This identifier (similar to a serial number) is tied to the device being operated.Keeping the service simple allows it to be quite extensible. Introducing additional capability for devices can be easily done, and if a device does not support the capability, it simply NOOPs it.The device service also acts as a device pooler:POST /device/reserveReserves a device and get a lease for a period of time.PUT /device/reserveRenew the lease of of previously reserved deviceGET /device/reserveList the devices currently reservedPOST /device/releaseRelease a device that was previously reservedPOST /device/disableTemporarily black lists the device from being used (in the event of a non-operation device situation or flaky health).GET /device/disableList the devices currently disabledHere are some pictures of some of the devices that we are running in the lab for automation. Notice the little mechanical hand near the power button for Xbox 360. This is a custom solution that we put together just for Xbox 360 as this device requires manual button press to reboot it. We decided to automate this manual process by designing a mechanical arm connected to a raspberry pi which sends control over to the hand for moving and pressing the power button. This action was added to the Xbox 360 device handler. The powercycle endpoint of device service calls the power cycle handler of Xbox 360. This action is not necessary for PS3 or PS4 and is not implemented in those handlers.Test serviceThe Test Service is the bookkeeper of a running test case session. Its purpose is to mark the start of a test case, records status changes, log messages, metadata, links to files (logs/crash minidumps collected throughout the test) and data series emitted by the test case until test completion. The service exposes simple endpoints invoked by the test framework running the test case:POST /tests/startMarks test as startedPOST /tests/endMark test as endedPOST /tests/configurationPost device configuration such as version, device model, etc…POST /tests/keepaliveA TTL health-check in the event the device goes unresponsive/tests/detailsPost some test data/resultsA test framework will typically internally call those endpoints as follow: Once the test has started, a call to POST /test/start is made A periodic keepalive is sent to POST /test/keepalive to let the Test Service know that the test is in progress.Test information and results are send using POST /test/configuration and POST /tests/details while the test is running When the test ends, a call to POST /test/end is madeNetwork Service — Bifröst BridgeThe network system that we have built to communicate to the device and do traffic shaping or dns manipulation is called the Bifröst Bridge. We are not altering the network topology and we are connecting the devices directly to the main network. Bifrost bridge is not required to run the tests and only optionally required when the tests require network manipulation such as overriding DNS records.File ServiceAs we are running tests, we can opt to collect files produced by the tests and upload them to a storage depot via the file service. These include device log files, crash reports, screen captures, etc… The service is very straightforward from a consumer client perspective:POST /fileUploads a file without specifying a name resulting in a unique identifier in the response that can be later used for downloadGET /file/:idDownloads a file with a given identifierThe file service is back by cloud storage and resources are cached for fast retrieval using Varnish Cache.DatabaseWe have chosen to use MongoDB as the database of choice for the Test Service because of its JSON format and the schema-less aspect of it. The flexibility of having an open JSON document storage solution is key for our needs because test results and metadata storage are always constantly evolving and are never finite in their structure. While a relational database sounds quite appealing from a DB management standpoint, it obstructs the principle of Plug-and-Play as the DB schema needs to be manually kept up to date with whatever tests might want.When running in CI mode, we record a unique run id for each test and collect information about the build configuration, device configuration, test details etc. Downloadable links to file service to logs are also stored in the database test entry.Test Runner — Maze RunnerIn order to reduce the burden of each test case owner to call into different services and running the tests individually, we built a controller which orchestrates running the tests and calling different services as needed called Maze Runner.The owner of the test suite creates a script in which he/she specifies the devices (or device types) on which the tests need to be run, test suite name and the test cases that form a test suite and asks Maze Runner to execute the tests (in parallel).Here are the list of steps that Maze Runner doesFinds a device/devices to run on based on what was requestedCalls into the Device Service to install a build Calls into the Device Service to start the test Wait until the test in marked as “ended” in the Test Service Display the result of the test retrieved using the Test Service Collect log files using the Device Service If the test did not start or did not end (timeout), Maze Runner checks whether the application has crashed using the Device Service. If the crash is detected, it collects the coredump, generates call stack and runs it through a proprietary call stack classifier and detects a crash signature Notify the Test Service if a crash or timeout occurred. At any point during the sequence, if Maze Runner detects a device has an issue (the build won’t install or the device won’t start because it lost its network connectivity for example), it will release the device, asking the device service to disable it for some period of time and will finally get a whole new device to run the test on. The idea is that pure device failure should not impact tests.Test frameworksTest frameworks are well separated from automation services as they are running along tests on the devices themselves. Most tests can be run manually with no need for automation services. This was one of the core principle in the design of the system. In this case tests are manually started and the results manually retrieved and inspected when the test is done.However test frameworks can be made to operate with automation services (the test service for example, to store the tests progress and results). We need this integration with automation services when tests are run in CI by our runner.In order to achieve this in a flexible way we created a single abstraction layer internally known as TPL (Test Portability Layer). Tests and test frameworks call into this layer which defines simple interfaces for each automation service. Each automation service can provide an implementation for those interfaces. This layer allows tests meant to be run by our automation to be executed on a completely different automation system provided that TPL interfaces for this system’s services are implemented. This enabled using test cases written by other teams (using different automation systems) and run them unchanged. When a test is unchanged, the barrier to troubleshooting a test failure on the device by the test owner is completely eliminated; and we always want to keep it that way.ProgressBy keeping the test framework independent of automation services, using automation services on an as required basis and adding the missing device features we managed to:Augment our test automation coverage on gaming consoles and reference applications. Extend the infrastructure to mobile devices (Android, iOS, and Windows Mobile). Enable other QA departments to leverage conducting their their tests and automation frameworks against our device infrastructure. Our most recent test execution coverage figures show that we execute roughly 1500 tests per build on reference applications alone. To put things in perspective, the dev team produces around 10-15 builds on a single branch per day each generating 5 different build flavors (such as Debug, Release, AddressSanitizer, etc..) for the reference application. For gaming consoles, there are about 3-4 builds produced per day with a single artifact flavor. Conservatively speaking, using a single build artifact flavor, our ecosystem is responsible for running close to 1500*10 + 1500*3 =~ 20K test cases on a given day.New ChallengesGiven the sheer number of tests executed per day, two prominent sets of challenges emerge:Device and ecosystem scalability and resiliency Telemetry analysis overload generated by test results In future blog posts, we will delve deeper and talk about the wide ranging set of initiatives we are currently undertaking to address those great new challenges.Benoit Fontaine, Janaki Ramachandran, Tim Kaddoura, Gustavo Branco

Shinguz: Beware of large MySQL max_sort_length parameter

Today we had a very interesting phenomena at a customer. He complained that MySQL always get some errors of the following type:

[ERROR] mysqld: Sort aborted: Error writing file ‘/tmp/MYGbBrpA’ (Errcode: 28 – No space left on device)After a first investigation we found that df -h /tmp shows from time to time a full disk but we could not see any file with ls -la /tmp/MY*.

After some more investigation we found even the query from the Slow Query Log which was producing the same problem. It looked similar to this query:

SELECT * FROM test ORDER BY field5, field4, field3, field2, field1;Now we were capable to simulate the problem at will with the following table:

CREATE TABLE `test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`data` varchar(64) DEFAULT NULL,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`field1` varchar(16) DEFAULT NULL,
`field2` varchar(16) DEFAULT NULL,
`field3` varchar(255) DEFAULT NULL,
`field4` varchar(255) DEFAULT NULL,
`field5` varchar(32) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=8912746 DEFAULT CHARSET=utf8
;An we have seen the query in SHOW PROCESSLIST:

| Query | 26 | Creating sort index | select * from test order by field5, field4, field3, field2, field1 | But we were still not capable to see who or better how the hell mysqld is filling our disk!

I remembered further that I have seen some strange settings in the my.cnf before when we did the review of the database configuration. But I ignored them somehow.

[mysqld]
max_sort_length = 8M
sort_buffer_size = 20MNow I remembered again these settings. We changed max_sort_length back to default 1k and suddenly our space problems disappeared!

We played a bit around with different values of max_sort_length and got the following execution times for our query:

max_sort_lengthexecution time [s]comment 64 8.8 s128 8.2 s256 9.3 s512 11.8 s 1k 14.9 s 2k 20.0 s 8k129.0 s 8M 75.0 sdisk full (50 G)Conclusion

We set the values of max_sort_length back to the defaults. Our problems disappeared and we got working and much faster SELECT queries.

Do not needlessly change default values of MySQL without proving the impact. It can become worse than before!!!

The default value of max_sort_length is a good compromise between performance and an appropriate sort length.

Addendum

What I really did not like on this solution was, that I did not understand the way the problem occurred. So I did some more investigation in this. We were discussing forth and back if this could be because of XFS, because of sparse files or some kind of memory mapped files (see also man mmap).

At the end I had the idea to look at the lsof command during my running query:

mysql> SELECT * FROM test ORDER BY field5, field4, field3, field2, field1;
ERROR 3 (HY000): Error writing file ‘/tmp/MYBuWcXP’ (Errcode: 28 – No space left on device)

shell> lsof -p 14733

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mysqld 14733 mysql 32u REG 8,18 9705619456 30147474 /tmp/MYck8vf4 (deleted)
mysqld 14733 mysql 49u REG 8,18 749797376 30147596 /tmp/MYBuWcXP (deleted)
So it looks like that there were some deleted files which were growing!

Further information from the IRC channel led me to the libc temporary files (see also man 3 tmpfile).

And some hints from MadMerlin|work pointed me to:

shell> ls /proc//fdWhere you can also see those temporary files.

Thanks to MadMerlin|work for the hints!
Taxonomy upgrade extras: sortfileorder by

Shinguz: Temporary tables and MySQL STATUS information

When analysing MySQL configuration and status information at customers it is always interesting to see how the applications behave. This can partially be seen by the output of the SHOW GLOBAL STATUS command. See also Reading MySQL fingerprints.

Today we wanted to know where the high Com_create_table and the twice as high Com_drop_table is coming from. One suspect was TEMPORARY TABLES. But are real temporary tables counted as Com_create_table and Com_drop_table at all? This is what we want to find out today. The tested MySQL version is 5.7.11.
Caution: Different MySQL or MariaDB versions might behave differently!

Session 1
Global
Session 2
CREATE TABLE t1 (id INT);Query OK, 0 rows affected
 
 
Com_create_table +1Opened_table_definitions +1
Com_create_table +1Opened_table_definitions +1
 
 CREATE TABLE t1 (id INT);ERROR 1050 (42S01): Table ‘t1’ already exists
 
 
Com_create_table +1Open_table_definitions +1Open_tables +1Opened_table_definitions +1Opened_tables +1
Com_create_table + 1Open_table_definitions +1Open_tables +1Opened_table_definitions +1Opened_tables +1
 
 CREATE TABLE t1 (id INT);ERROR 1050 (42S01): Table ‘t1’ already exists
 
 
Com_create_table + 1
Com_create_table + 1
 
 DROP TABLE t1;Query OK, 0 rows affected
 
 
Com_drop_table +1Open_table_definitions -1Open_tables -1
Com_drop_table +1Open_table_definitions -1Open_tables -1
 
 DROP TABLE t1;ERROR 1051 (42S02): Unknown table ‘test.t1’
 
 
Com_drop_table -1
Com_drop_table -1
 
 CREATE TEMPORARY TABLE ttemp (id INT);Query OK, 0 rows affected
 
 
Com_create_table +1Opened_table_definitions +2Opened_tables +1
Com_create_table +1Opened_table_definitions +2Opened_tables +1
 
 CREATE TEMPORARY TABLE ttemp (id INT);ERROR 1050 (42S01): Table ‘ttemp’ already exists
 
 
Com_create_table +1
Com_create_table +1
 
 DROP TABLE ttemp;Query OK, 0 rows affected
 
 
Com_drop_table +1
Com_drop_table +1
 
 CREATE TEMPORARY TABLE ttemp (id int);Query OK, 0 rows affected
 
CREATE TEMPORARY TABLE ttemp (id int);Query OK, 0 rows affected
Com_create_table +1Opened_table_definitions +2Opened_tables +1
Com_create_table +2Opened_table_definitions +4Opened_tables +2
Com_create_table +1Opened_table_definitions +2Opened_tables +1
 DROP TABLE ttemp;Query OK, 0 rows affected
 
DROP TABLE ttemp;Query OK, 0 rows affected
Com_drop_table +1
Com_drop_table +2
Com_drop_table +1
Conclusion

A successful CREATE TABLE command opens and closes a table definition.
A non successful CREATE TABLE command opens the table definition and the file handle of the previous table. So a faulty application can be quite expensive.
A further non successful CREATE TABLE command has no other impact.
A DROP TABLE command closes a table definition and the file handle.
A CREATE TEMPORARY TABLE opens 2 table definitions and the file handle. Thus behaves different than CREATE TABLE
But a faulty CREATE TEMPORARY TABLE seems to be much less intrusive.
Open_table_definitions and Open_tables is always global, also in session context.
Taxonomy upgrade extras: statustemporary table

FTP server with PureFTPd, MariaDB and Virtual Users (incl. Quota and Bandwidth Management) on CentOS 7.2

This document describes how to install a PureFTPd server that uses virtual users from a MariaDB (MySQL compatible) database instead of real system users. This is much more performant and allows to have thousands of FTP users on a single machine. In addition to that, I will show the use of quota and upload/download bandwidth limits with this setup. Passwords will be stored encrypted as MD5 strings in the database.

5 Database Insights Easy to See with VividCortex SaaS Monitoring

There are manifold ways to collect, visualize, and analyze data… but not all methods are equally useful. VividCortex, however, is singular as a database-centric SaaS monitoring platform, and it’s designed to provide you with powerful insights into your system that are both inherently actionable and unique. Within minutes of first booting up VividCortex, users frequently discover new aspects of their system. They understand it in brand new ways, just by viewing our app’s basic dashboards and metrics.
But that’s just the start. Even beyond those initial revelations, there are many more powerful insights that VividCortex can provide, if you know how and where to look. These views aren’t entirely automatic, but they’re simple to discover with a few tips. Here are 5 insights easy to see with VividCortex.
Find which queries affect the most rows
Understanding which queries are affecting the highest number of rows in your system is a useful way to understand the amount of change occurring in your dataset. By organizing this change as “affected rows,” you’re seeing these developments in terms of a powerful, raw metric. “Affected rows” refers to any row that was changed by an UPDATE, INSERT, or DELETE, based based on the OK Packet or Performance_schema data.
To view queries organized in terms of affected rows, head to the Profiler and then rank “Queries” by “Affected Rows.”

The Profiler will generate a view like this one, giving you immediate, legible insight into which queries are causing the widest range of change.

Find the largest group of similarly grouped queries
If you’re able to see the largest group of similar queries, it gives you a window into application behavior, which, in turn, can be used for sharding decisions and other growth strategies. No small thing. Alternatively, examining query verbs can very quickly show you the read to write ratio of a workload, which can be leveraged at further decision points.
To view queries this way, head back to the Profiler and rank them according to “Count.” You’ll then see the total number of queries, grouped similarity and organized by quantity. Alternatively, you can rank “Query Verbs” in the same way and retrieve the number according to command type. In both cases, you see which queries are executing the most frequently in your system.

Find memory allocation stalls
As explained by the kernel documentation, memory allocation stalls refer to times when a process stalls to run memory compaction so that a sizable page is free for use. With VividCortex, you’re able to see the number of times this happens in a given timeframe, allowing for further investigation. To do so, head to the Metrics dashboard and enter the metric text as “os.mem.compact_stalls”.

 
Find IO Wait
IO Wait — the time the CPU waits for IO to complete —  can cause stalls for page requests that memory buffers are unable to fulfill and during background page flushing. All of this can have widespread impacts on database performance and stability. Using the metrics dashboard in VividCortex, you’re able to see these stalls by duration over time and brokendown by host.

In the Metrics dashboard, use “os.cpu.io_wait_us” as the metric text.

Find long running transactions
Also in the metrics dashboard, you can see long running transactions by viewing the redo segment history length, which, in turn, represents transaction redo segment history length. This is essentially the overhead of yet-to-be-purged MVCC. (Interested in more about this? Here’s a blog post from VividCortex’s founder Baron Schwartz on “the rabit hole that is MVCC.” Or how about some explanation on InnoDB History List Length?)
Naturally, seeing spikes of long running transactions and providing explanation for this overhead is a valuable ability — and easily accomplished with VividCortex. Just use the metric text “mysql.status.i_s_innodb_metrics.trx_rseg_history_len”.

Want to see more?
These tips and insights just scratch the surface. VividCortex has much more visibility available to anybody interested in seeing SaaS, database-centric monitoring in action. If you’d like to find further tips on how to get the most out of VividCortex — or would like to see how much value it can give you and your systems — don’t hesitate to get in touch.  

Cédric Bruderer: Why is varchar(255) not varchar(255)?

Recently I was working on a clients question and stumbled over an issue with replication and mixed character sets. The client asked, wether it is possible to replicate data to a table on a MySQL slave, where one column had a different character set, than the column in the same table on the master.

 
I set up two servers with identical table definitions and changed the character set on one column on the slave from latin1 to utf8.
 
Master:

CREATE TABLE `test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`data` varchar(255) DEFAULT NULL,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

 
Slave:

CREATE TABLE `test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`data` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

 
So far no problem, I was able to start the replication and set off some INSERT statements with special characters (like ä, ö, ü, …). But when I went to look for them in the slave’s table, I could not find them.
 
“SHOW SLAVE STATUS”, showed me this error:
Column 1 of table ‘test.test’ cannot be converted from type ‘varchar(255)’ to type ‘varchar(255)’
 
You might ask yourself now: But the columns have the same type, what is the problem? What is not shown in the error is the fact, that there are two different character sets.
 
The log file is of no help either. It only shows the same error and tells you to fix it.

2016-05-26 15:51:06 9269 [ERROR] Slave SQL: Column 1 of table ‘test.test’ cannot be converted from type ‘varchar(255)’ to type ‘varchar(255)’, Error_code: 1677
2016-05-26 15:51:06 9269 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with “SLAVE START”. We stopped at log ‘valkyrie_mysqld35701_binlog.000050’ position 120
2016-05-26 15:53:39 9269 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)

 
Skipping the statement will not work, as the server will just fail again, when the next statement shows up.
 
For all those who are now running to change the character set: STOP!
Changing characters set of columns or tables containing data can be fatal when done incorrectly. MySQL offers a statement to convert tables and columns to the character set you wish to have.
 
To convert the entire table, you can write:
ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;
 
To convert a single column, you can write:
ALTER TABLE tbl_name MODIFY latin1_column TEXT CHARACTER SET utf8;
 
More details can be found in the ALTER TABLE documentation of MySQL. (Converting character sets is at the end of the article.)
 
Just to be clear, this is no bug! MySQL replication was never intended to work with mixed character sets and it makes a lot of sense, that the replication is halted when differences are discovered. This test was only an experiment.

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569