Announcing TokuDB v7.5: Read Free Replication
Today we released TokuDB® v7.5, the latest version of Tokutek’s storage engine for MySQL and MariaDB.
I’ll be publishing two blogs next week to go into more details about our new “Read Free Replication”, but here are high level descriptions of the most important new features.
Read Free Replication
TokuDB replication slaves can now be configured to process the binary logs with virtually no read IO. This is accomplished via two new server parameters: one to allow the skipping of uniqueness checks (for inserts and updates), the other to eliminate read-before-write behavior (for updates and deletes). The two other conditions are that the slave must be in read-only mode and replication must be row based.
Hot Backup Now Supports Multiple Directories (Enterprise Edition)
The original implementation of our Hot Backup functionality was only capable of backing up servers with single “data” directory for all files (datadir). This enhancement adds the capability to backup differing tokudb_data_dir, tokudb_log_dir, and binary log folder.
Increased Usage of Bulk Fetch
Long ago, we added the capability in TokuDB to utilize a bulk fetch algorithm when selecting ranges of rows, which significantly boosts the performance of certain SELECT statements. This release bring the same performance enhancement to more complex SELECT statements as well as DELETE statements. The new SELECT statements covered are “INSERT [IGNORE] INTO … SELECT”, “INSERT INTO … ON DUPLICATE KEY UPDATE”, “REPLACE INTO … SELECT”, and “CREATE TABLE … SELECT”.
Updated MySQL and MariaDB
We are committed to updating our MySQL and MariaDB versions with each release, so we are shipping TokuDB v7.5 on MySQL 5.5.39 and MariaDB 5.5.39.
To learn more about TokuDB:
Download it, check out the documentation, or see the full release notes.
Lessons from Deploying MySQL GTID at Scale
by Evan Elias and Santosh Praneeth BandaGlobal Transaction ID (GTID) is one of the most compelling new features of MySQL 5.6. It provides major benefits in failover, point-in-time backup recovery, and hierarchical replication, and it's a prerequisite for crash-safe multi-threaded replication. Over the course of the last few months, we enabled GTID on every production MySQL instance at Facebook. In the process, we learned a great deal about deployment and operational use of the feature. We plan to open source many of our server-side fixes via WebScaleSQL, as we believe others in the scale community can learn from this and benefit from the work we've done.BackgroundTraditional MySQL replication is based on relative coordinates — each replica keeps track of its position with respect to its current master's binary log files. GTID enhances this setup by assigning a unique identifier to every transaction, and each MySQL server keeps track of which transactions it has already executed. This permits "auto-positioning," the ability for a replica to be pointed at a master instance without needing to specify a binlog filename or position in the CHANGE MASTER statement.Auto-positioning makes failover simpler, faster, and less error-prone. It becomes trivial to get replicas in sync after a master failure, without requiring an external tool such as Master High Availability (MHA). Planned master promotions also become easier, as it is no longer necessary to stop all replicas at the same position first. Database administrators need not worry about manually specifying incorrect positions; even in the case of human error, the server is now smart enough to ignore transactions it has already executed.By permitting replicas to be repointed to masters at different levels of the hierarchy, GTID greatly simplifies complex replication topologies, including hierarchical replication (slaves of slaves). Since a GTID-enabled binlog stream can safely be taken from any member of a replica set, as well as replayed without requiring relative positions, the feature also eases binlog backup and recovery.Additionally, by combining GTID with semi-synchronous replication, we have implemented automation to safely recover crashed masters as replicas. When a master crashes, we can detect this and promote a replica within 30 seconds without losing data. Later, if the original master was able to be recovered and our automation detects its data is consistent, GTID allows us to repoint it to the new master instead of having to kick off a copy operation to replace it.DeploymentGTID in MySQL 5.6 is extremely difficult to deploy to an existing large-scale environment. The gtid_mode variable is not dynamic, and also must match between master and replicas. The official deployment plan in the MySQL documentation involves making a master read_only, shutting down all MySQL instances in the replica set simultaneously at the same position, enabling the gtid_mode variable in my.cnf, and then starting all of the instances. This process is fundamentally incompatible with the notion of high availability, making it unviable for production use at scale.We worked around this problem in fb-mysql by relaxing the constraint that gtid_mode must match between master and replicas. We permit a gtid_mode=ON replica to have a gtid_mode=OFF master, and we suppress assignment of GTIDs to transactions on servers with read_only=ON (i.e., replicas). This permits a high-availability deployment strategy as follows, for each replica set:On each replica, one at a time, restart MySQL to enable gtid_mode. Afterward, we are in a state where every replica has gtid_mode=ON, but the master still has gtid_mode=OFF.Perform a master promotion as normal, repointing the replicas and original master to a new master. The original master's replication will intentionally break when started, since it still has gtid_mode=OFF.Restart the original master to enable gtid_mode. It will now be able to replicate from the new master, and the entire replica set now has gtid_mode=ON.With sufficient safeguards and validation logic, it is safe to execute this rollout process to a large number of replica sets at a time. During the peak of the deployment process, we were running our rollout script on up to hundreds of replica sets simultaneously.fb-mysql changesTo deploy GTID and make it work in our production environment, we had to make huge changes both in the MySQL server and in the automation around it. Apart from the deployment changes, during initial testing we have encountered a number of serious bugs and performance regressions with GTID.In some cases, the MySQL server ended up scanning all binary logs whenever a replica connected with the new auto-position protocol, or during crash recovery, to initialize the global variables GTID_PURGED and GTID_EXECUTED. Opening the binary logs is required to read the previous_gtid_events present at the beginning of the binary log. In fb-mysql, we fixed these issues by changing the format of binary log index file to include previous_gtid_event corresponding to each binary log file name. Fb-mysql uses the previous_gtid_events present in binary log index file instead of opening the binary logs directly thus improving performance.We also found that slaves with GTID are not crash-safe with less durable settings (sync_binlog !=1 and innodb_flush_log_at_trx_commit != 1). Using fully durable settings requires syncing both the binary log and innodb transaction log to disk after each transaction in single-threaded replication mode, which negatively affects slave apply performance. It is important for any feature to be crash-safe to avoid operational overhead at Facebook scale. So in fb-mysql, we decided to fix this issue by adding a new transaction table (mysql.slave_gtid_info) to consistently store the GTID information.GTID is a powerful feature that simplifies many replication complexities. For example, we identified multi-threaded slave doesn’t work along with relay_log_recovery=1, but relay_log_recovery=1 is required for crash safety even with less durable replication settings (sync_relay_log != 1). In fb-mysql we allowed the use of relay_log_recovery=1 with multi-threaded slave when GTID is enabled, since gaps in execution created after the multi-threaded slave crash are automatically filled by GTID auto-positioning.PreparationThere were several steps we had to take prior to beginning our GTID deployment. One major step involved updating all of our automation to use GTID and auto-positioning. Enabling gtid_mode without use of auto-positioning is detrimental to replica crash-safety, so it is important to roll out both at once. The most substantial change was to our promotion logic, which now had to cover additional permutations for whether GTID was already enabled, or being enabled for the first time.Another important prerequisite involves prevention of GTID-incompatible statements. MySQL has an option, enforce_gtid_consistency, that causes these statements to generate an error. For safety's sake, this should always be enabled whenever gtid_mode is also enabled. However, before beginning the rollout, it is necessary to audit applications and preemptively fix any uses of these query patterns. To make this possible at our scale, we augmented MySQL to add user stat counters for these statements, as well as an option to write full information on them to the MySQL error log. This allowed us to easily identify around 20 cases of these query patterns being used, among our thousands of special-case workloads.Finally, we wrote a script to aid in skipping statements, in the rare cases where that is necessary. The age-old variable sql_slave_skip_counter does not work with gtid_mode enabled; instead, a DBA must fiddle with the gtid_next variable and insert an empty transaction. This is painful in an emergency, especially while a large DBA team is still ramping up on GTID knowledge, so having a helper script is prudent.For a more in-depth technical overview of our GTID-related MySQL changes and automation efforts, please see our slides from Percona Live MySQL Conference 2014.SummaryFacebook's Global Transaction ID deployment was a cross-functional collaboration between our MySQL engineering, database operations, and data performance teams. Deploying GTID to a Facebook-scale environment required substantial effort, including major improvements to the MySQL server, changes to our automation, and a custom rollout script. Early on in the rollout process, we also uncovered, reported, and fixed several bugs and performance issues in MySQL relating to GTID support. We can happily state that it is now extremely stable in our use, with no new problems encountered in recent months. We plan to make these same fixes to the WebScaleSQL branch of MySQL in the next few weeks, so that others can learn and benefit from this.Despite the effort involved, deploying GTID has proven to be well-worth the time commitment. The feature has provided us immediate benefits, in addition to being a base for further automation improvements in the near future.List of GTID bugs #69059 GTID lack a reasonable deployment strategy#69097 Mysqld scans all binary logs on crash recovery#68386 Master scans all binlogs when slave reconnects with auto positioning#70659 Make crash safe slave work with gtid + less durable settings#69943 Transactions skipped on slave after "stop/start slave" using GTID replication#71575 Master logs two consecutive GTIDs causing slaves to miss the first GTID#72313 Stop sql_thread, start sql_thread causes a trx to log with a different GTID#72314 Stop io_thread, start io_thread with GTID may cause data inconsistencies#72635 Data inconsistencies when master has truncated binary log with GTID after crash #73032 Setting gtid_purged may break auto_position and thus slaves#70711 Mysqlbinlog prints invalid SQL from relay logs when GTID is enabled#73397 Make MTS work with relay_log_recovery=1 when GTID is enabled
Managing big data? Say ‘hello’ to HP Vertica
Over the past few months, I’ve seen an increase in the following use case while working on performance and schema review engagements:I need to store exponentially increasing amounts of data and analyze all of it in real-time.This is also known simply as: “We have big data.” Typically, this data is used for user interaction analysis, ad tracking, or other common click stream applications. However, it can also be seen in threat assessment (ddos mitigation, etc), financial forecasting, and other applications as well. While MySQL (and other OLTP systems) can handle this to a degree, it is by no means a forte. Some of the pain points include:Cost of rapidly increasing, expensive disk storage (OLTP disks need to be fast == $$)Performance decrease as the data size increasesWasted hardware resources (excess I/O, etc)Impact against other time-sensitive transactions (i.e. OLTP workload)While there are many approaches to this problem – and often times, the solution is actually a hybrid of many individually tailored components – a solution that I have seen more frequently in recent work is HP Vertica.At the 30,000 foot overview, Vertica is built around the following principles:Columnar data storeHighly compressed dataClustered solution for both availability and scalabilityOver the next few weeks, I’ll discuss several aspects of Vertica including:Underlying architecture and conceptsBasic installation and useDifferent data loading techniquesSome various maintenance/operational proceduresSome comparisons vs. traditional OLTP (MySQL) performanceSome potential use-casesIntegration with other tools (such as Hadoop)While Vertica is by no means the silver bullet that will solve all of your needs, it may prove to be a very valuable tool in your overall approach to managing big data.The post Managing big data? Say ‘hello’ to HP Vertica appeared first on MySQL Performance Blog.
Tweaking MySQL Galera Cluster to handle large databases - open_files_limit
September 18, 2014
Galera Cluster is a popular choice for achieving high availability using synchronous replication. Though if you are planning to run huge sites with many DB objects (tables), a few tweaks are necessary.
Yes, you might have been successful in loading your 1000s of databases and 1000s of tables, but what happens if you have a node failure and Galera recovery fails?
In this blog post we will show you how to determine one common error related to the open_files_limit that MySQL imposes, and also to spot another potential pitfall.
If you are using wsrep_sst_method=xtrabackup or wsrep_sst_method=xtrabackup-v2 then you will find a log file in the data directory of the donor node. This log file is called innobackup.backup.log.
140912 19:10:15 innobackupex: Done.
IMPORTANT: Please check that the backup run completes successfully.
At the end of a successful backup run innobackupex
prints "completed OK!".
innobackupex: Using mysql server version 5.6.16
innobackupex: Created backup directory /tmp
140912 19:10:16 innobackupex: Starting ibbackup with command: xtrabackup --defaults-file="/etc/mysql/my.cnf" --defaults-group="mysqld" --backup --suspend-at-end --target-dir=/tmp --tmpdir=/tmp --extra-lsndir='/tmp' --stream=tar
innobackupex: Waiting for ibbackup (pid=4274) to suspend
innobackupex: Suspend file '/tmp/xtrabackup_suspended_2'
xtrabackup: open files limit requested 65535, set to 65535
>> log scanned up to (2145319596)
>> log scanned up to (2145319596)
2014-09-12 19:24:09 7f5d2080a740 InnoDB: Operating system error number 24 in a file operation.
InnoDB: Error number 24 means 'Too many open files'.
InnoDB: Some operating system error numbers are described at
InnoDB: Error: could not open single-table tablespace file ./db_2760/t_0.ibd
In this case the DONOR failed because of the error:
2014-09-12 19:24:09 7f5d2080a740 InnoDB: Operating system error number 24 in a file operation
At this point you have to increase the open_files_limit in your my.cnf file. Unfortunately, open_files_limit is not a global variable, and you must restart the MySQL server in order for the change to be effective.
Free eBook! Configuring MySQL for Performance
Are you overwhelmed with the hundreds of MySQL settings? Lost trying to tune the query optimizer and InnoDB buffer pools? Is the configuration process daunting?
We are here to help. Download your free copy of our latest eBook, Configuring MySQL for Performance. In just minutes, you will gain insight into setting configurations for optimal MySQL performance.
It is fast, and it is free. Request your copy here.