XFS and EXT4 Testing Redux
In my concluded testing post, I declared EXT4 my winner vs XFS for my scenario. My coworker, @keyurdg, was unwilling to let XFS lose out and made a few observations:
XFS wasn’t *really* being formatted optimally for the RAID stripe size
XFS wasn’t being mounted with the inode64 option which means that all of the inodes are kept in the first 2TB. (Side note: inode64 option is default in newer kernels but not on CentOS 6’s 2.6.32)
Single threaded testing isn’t entirely accurate because although replication is single threaded, the writes are collected in InnoDB and then writes it to disk using multiple threads governed by innodb_write_io_threads.
Armed with new data, I have – for real – the last round of testing.
To keep things a bit simpler, I will be comparing each file system on 2TB and 27TB, with 4 threads, which matches the default value for innodb_write_io_threads in MySQL 5.5.
XFS finally wins out clearly over EXT4. XFS being dramatically slower on 27T earlier really shows how much the worse the performance between inode32 and inode64 is and explains why it was that much better on 2T. Fixing the formatting options pushed XFS over the top easily.
All that’s left to do is setup multiple instances until replication can’t keep up anymore.
MySQL: The most popular open source database for WWW
(Note : This an Article from last year when MySQL5.6 was released)
While Database technology is one of the oldest branches of computer science, it remains a fundamental computer technology that continues to attract new research. The current focus of Databases technology is towards adapting hot new tends like multi-core chips, solid state devices, NOSQL and Cloud. So what does a contemporary internet developer look for in a database for the internet era? And why does MySQL remain the most popular database for the web?
For a database to be useful while developing products for the Web, the most important requirements are that it should be quick and easy to download, quickly to set up, powerful enough to get the job done, be fast and flexible to use and finally be scalable on the newest hardware. Compatibility with the latest technologies like the cloud also remains foremost in the minds of developers since they need their products to be future proof.
You may notice, some of these requirements are common for any web product to be successful.
Let us go into the details of each of the above requirements, discuss their importance and look at how MySQL addresses them.
In the instant web world of today with so many technologies competing for attention, you should either be available instantly on the cloud or should be quick and intuitive to be downloaded and setup. While 4-5 years ago someone may have given an application 10 minutes to download and set up, today, expect technologies to be ready for use within three minutes. The download process needs to be fast and the setup needs to be intuitive. A person using a technology for the first time does not want to be bothered with arcane questions to set up the software. The software should have reasonable defaults which can be tweaked later as the application is found suitable and is deemed worthy of some user time. It has been our endeavor to ensure that MySQL can indeed be set up, and the first query run within three minutes.
Many users and companies are looking to use the database as a service provided by a cloud provider. MySQL is the leading database in the cloud, offered by the vast majority of cloud services providers. Almost all cloud service providers like Amazon, Rackspace, Eucalyptus and Google provide MySQL as a service. It is a part of many cloud stacks including openstack and cloudstack which are available as a service. In many cases MySQL is the underlying database for cloud services provisioning and implementation. Being the ubiquitous cloud database means that MySQL is the best fit for the cloud and also the nimblest database to take keep up with the advances in cloud technology.
This assures a new user that MySQL is an easy database to use. It gives them the confidence that there will be a return for the time spent to adapt their software and infrastructure to the requirements for MySQL.
When we talk about software being powerful there are many aspects that need to be covered.
The first requirement to be considered powerful is that all the basics must be well implemented. This creates a strong foundation on which this power is built. In terms of databases this means that all aspects of ACID (atomicity, consistency, isolation and durability) are implemented. ACID is provided in most databases by having strong transactional capabilities. This is usually achieved by implementing the 2 phase commit protocol. As an over-simplification we can say the transaction is done in two phases first it is written to a log file and then this log file is flushed to the disk.
This logging needs to happen at the lowest layer in a database called the Storage Layer.
Besides the logging, there are many crucial database level algorithms that need to be implemented in a balanced way to ensure that a Web database is able to shoulder the load that large websites expect to generate. A good transactional storage engine may need to implement its own threading, memory management and disk management
Storage layer and engines
In MySQL the storage layer of the database is abstracted. See “Storage engines Layer” in fig. MySQL Architecture. Though the default Storage engine is InnoDB there can be any number of storage engines that can be plugged in at this layer.
InnoDB is an extremely powerful transactional storage engine that has withstood the test of time by handling databases having terabytes of data and used by millions of users.
Since MySQL has an abstract layer and then the real storage engine, the logging is done at 2 levels: first at the abstract storage engine layer and then per storage engine based on the capability of the storage engine. At the abstract storage engine layer the log is called the binary log. At the InnoDB engine level, it can be oversimplified and called the redo log.
The binary log can be used to provide some level of atomicity, consistency and durability between engines but since engines are so varied it is a complex topic and beyond the scope of this article. Later, we will cover another very important feature that uses the binary log for its implementation. Observant readers might wonder about the missing I- Isolation. In a multi-user, multi-threaded database there are thousands maybe millions of transactions that are happening per minute. Isolation is a way to ensure that these transactions are isolated from each other. The database system produces an output as if the transactions are happening serially and not in parallel. There are two high level algorithms that provide isolation, either with locks or via multi-version concurrency control (MVCC). Locking restricts or manages the modification of the same data by multiple transactions. This can lead to a delay in accessing data locked by a long running transaction. MVCC produces a new version of data for each new transaction that is initiated on that data. If any transaction commits, all the other transactions having an older version of the data may need to abort or take the new data into consideration. The InnoDB engine provides Isolation by implementing MVCC.
Besides these fundamentals, the database engine should provide common database features like stored procedures and triggers and a reasonable adherence to the SQL standards.
Development and management tools
Any software is as powerful as the tools that are provided to make the software more usable.
For databases it is important that users can visually model their database using entity relationship (E-R) diagrams to manage the database objects, manage users, and have an easy visual way to modify their existing database. For large installations it is not only important to manage a single instance of the database but also be able to manage multiple database installations. DBA (database administrator) today need to know exactly what is happening with their database in real-time. Is there an installation that is stalled? Is there a user, process or query that is hogging time? An answer to such question in minutes rather than milliseconds may translate to downtime for a database, involving loss of reputation and money. If a large website goes down it is front page news.
However well the data and database is managed, database users need a way to ensure that there is insurance from disaster in the form of a backup of data.
MySQL is shipped with a modern visual tool called MySQL Workbench, which allows users to model and manage their databases as well as their users. For large installations MySQL provides a paid tool called MySQL Enterprise Monitor to manage multiple installations and look at moving graphs of ongoing database activity in the data-center. There are also other tools available from multiple vendors to manipulate and manage MySQL installations. There are multiple free and paid tools available for backing up and restoring a MySQL database.
Some of the tools mentioned above are built on a strong foundation called Performance Schema. Performance Schema is (PS) — a framework that has the meta-data about all that is happening in the database as well as constructs to allow a user to view and filter this data. Performance Schema captures information about users and queries down to threads, mutexes and their corresponding wait times and process times. It contains constructs to allow this fine grained information to be extracted using events and filters. DBAs with a strong knowledge of PS can find out if anything out of the ordinary is happening with their MySQL database. Users can set up the amount of monitoring that they need and the performance schema will populate only the requested data. Since PS is expected to generate a huge amount of data; users can limit the total data stored. The data is transiently stored in cyclic tables in a limited amount of memory.
As computer technology progresses some things change while others remain the same. The number of computer languages available to a programmer is among the things that have changed while the popularity of the old C, C++ and Java remain almost the same. Programmers need to be able to access a database using the language of their choice. The latest enhancements to the database should be made available in the language that they are using. We are fortunate that the hottest new languages want to adopt MySQL. MySQL has excellent drivers for Node.js, Ruby and GoLang. To maintain hundreds of such language connectors is a huge drain on any engineering organization. MySQL is made powerful by the number of computer languages that programmers can use to connect to it. The topmost layer in MySQL Architecture shows the client layer where the connectors exist. Whichever language you may use to write your program, chances are; there will be a connector (or client API) that allows you to connect to MySQL and exchange data with it. The MySQL community is a great source of strength in this area. There are a huge number of language connectors not only written but which continue to be maintained by the MySQL community. MySQL is very lucky to have great community developed drivers for Ruby and GoLang MySQL continues to officially maintain the ODBC, C++, Java, Net and (the most recent) Python connectors.
Any Web product benefits from user and community attention. Attention generates and maintains the cycle of adoption, growth and stability. In the connected socially networked world of today, users expect quick answers to their problems. A large user base also ensures that any problem that a user experiences, may have already been manifested and resolved. Our user community is a huge blessing for MySQL. An open and well informed MySQL user community is ready to answer any questions users may have, related to their MySQL installation. There are a multitude of forums where myriad discussions are recorded about users having difficulties, along with the solutions to those problems. The community generates ideas and sometimes code to improve MySQL, files bugs and most importantly is the sounding board that encourages MySQL developers to do better and compliments them for a job well done.
Redundancy, replication and sharding
As a database grows popular, larger and more highly trafficked websites rely on it. For large and distributed websites, high availability is extremely important. Redundancy needs to be built into all layers of Web architecture, including the database layer. MySQL provides this via MySQL Replication a way to define and manage a series of master-slave set-ups. For some large installations where the read load is much higher that the write load, it makes sense to distribute this read load to many, sometimes hundreds if not thousands; of slave replicas of the master server. MySQL replication has many options to configure these master slave set-ups to ensure that a slave automatically takes over as a master if the master fails. Replication leverages the binary log to ensure that database events are transported from the master to the slave/s database instance. MySQL provides for replication that is row or statement based, synchronous or asynchronous, instant or time delayed. The capabilities are powerful while the administration is easy using tools like Workbench and MySQL Utilities. Database replication is a vast and complex topic with a huge amount of current database research being focused on it. It’s no wonder then that the latest version of MySQL has a number of enhancements for MySQL Replication like check-sums, global transaction ID, crash safe slaves etc.
A different, sometimes complementary, approach to handling large database implementations is to partition or shard the data. A single large table could be broken up into smaller portions depending upon the commonality of the data. For example a user table could have separate partitions or shards based on the nationality of the user. This is especially useful when the most frequent queries require data from a single partition. MySQL has extensive support for table partitioning at the InnoDB storage engine level. The latest labs releases demonstrate how MySQL can be sharded across multiple machines. The entire complexity of identifying the shard on which the required data resides is the responsibility of the new sharding infrastructure called MySQL Fabric.
No real world software is complete without adequate attention to security. A database must ensure that it is secure. In the world of multi-tenancy databases special care needs to be taken to ensure that a user has sufficient privileges before any access is given to the data. MySQL implements user privileges at the database, table and column levels. There are also privileges for objects like views, procedures etc. Besides the normal create, modify and delete privileges, DBAs can also restrict the quantum of a privilege given to a user by, for example limiting the number of queries the user can run per hour or the number of connections the user may open.
MySQL also provides integration with third-party authentication modules like PAM. Recent releases give the MySQL DBA extensive password control to ensure adequate strength of passwords and expiration rules.
Let us now talk about the speed and efficiency required from a database system for the Web. The easiest way to increase the speed of access to the data is by adding indexes. Indexes should not only be created on simple columns with integers and strings but complex columns with large text fields (called full text or CLOB). The indexation needs to understand the sorting rules for the different world languages and character sets. Users with large databases need the option to create sub-indexes or secondary indexes on the primary index. Indexes should be quick to create and quick to update when new data is added. MySQL provides easy index creation and modification with some special techniques for fast index creation. The latest version of MySQL provides options to create a full-text index for all language collations. On the fly creation and deletion of indexes is also offered by the latest version of MySQL.
The speed of a query depends on the amount of data that needs to be fetched for a query, for complex queries the sequence of fetching the data may matter. The optimizer uses statistics about the data contained in the database to determine the most efficient sequence for fetching data. Optimizers are complex and constantly changing because of the complex filtering, increasing the size and wider distribution of data that needs to be gathered for the query. MySQL database’s speed is also enhanced by its versatile optimizer. MySQL is constantly working on our optimizer which continues to evolve. The latest version of the optimizer provides exponential speed improvements for several classes of queries.
When databases were first created hard disks were in vogue, and many database algorithms were based on the rotation and buffering attributes of hard disks. Databases today need to adapt to the new world of solid state devices (SSDs). Though these devices are currently expensive, a limited use of these can result in huge speed gains. Databases for the web need to be SSD aware. The newest MySQL is adapting to use SSDs. It is now possible to relocate portions of the database which are more frequently accessed to a different path. A user can now choose his log files to be on a path which points to an SSD. The same flexibility is also provided for portions (read partitions) or complete tables or table spaces.
On the hardware side, there is not only progress in terms of SSD’s but the nature of the microprocessor itself has changed. The microprocessor technology continues to increase the number of processors on the chip and multiple cores on each processor. This means that a server has multiple threads and these threads can run uninterrupted on their separate core. This has huge implications for finely tuned software like a database servers. The thread bottlenecks and points of contention are now very different from a single or dual core chips. The database therefore needs to be re-architected to take advantage of say 64 threads running at the same time. Users expect that if they spend money to upgrade from a 32 core machine to a 64 core machine, the throughput of the database should also double. The MySQL team has had to work hard on architecture to ensure that the latest MySQL scales almost linearly up to 64 cores and half a million queries per second.
Read DimitryK’s blog at http://dimitrik.free.fr/blog/index.html for benchmark numbers for MySQL
Flexibility—online schema changes
Since I already talked about scalability earlier, let me devote the final paragraphs to flexibility
The current trend of NoSQL was initiated because database users felt constrained at having to define a fixed schema before developing their application. With today’s agile models; schema changes happen more frequently and need to happen dynamically on 24×7 sites. These sites are business critical and down-times for these changes are very expensive for an Internet business. Paradoxically, increasing competition means that new changes need to be provided at the speed of the Internet, without disruption to the current production environment. Traditionally, changing database schemas was a huge investment of time for a database that did not allow dynamic schema changes. This business requirement has led to the new online schema change model which the latest MySQL 5.6 also supports. You can add and modify columns in a table on the fly. New indexes can also be created and dropped online. Relationships between tables in the form of foreign keys can also be altered while the server is running.
This is the Internet age and trends rise and get established very quickly. One of the ways for any software to provide flexibility to its users is to embrace upcoming trends. MySQL adapted to the NoSQL trend by providing their users an option to access data using either SQL or NoSQL.
The user gets the full advantage of having a strong ACID compliant underlying layer while having the flexibility of a schema-less data architecture. The protocol we have chosen to expose for NoSQL is the memcached protocol. So if you have a running application using memcached you can choose to use it with MySQL. Since the NoSQL interface connects to the lowest layer of MySQL (See MySQL Architecture), there is no SQL processing overhead from the parser/optimizer and query execution layer.
MySQL has also embraced the big data trend with the integration of Hadoop using the MySQL applier for Hadoop. This allows realtime transfer of data from MySQL to a Hadoop database. Bug data is being used to crunch the huge data coming getting created on the web. This data is being generated from multiple sources including devices connected to the Web. And since the underlying database of the web is MySQL, it is important that Hadoop and MySQL talk to each other realtime.
There are multiple success stories of MySQL which you can read on the web. Would encourage you to read blogs from Twitter (new-tweets-per-second-record-and-how), Facebook (Mark Callaghan).
Staying at the forefront and remaining the most popular database is a complex and interesting challenge for the MySQL development team. We thank you dear reader for your support and attention.
High Performance Drupal with MariaDB
Mon, 2014-10-20 09:31maria-luisaraviolI am back from 2014 Amsterdam Drupalcon where MariaDB Corporation was present as sponsor. It was my first time there and I must say I was really impressed by the amount of people attending the conference (around 2300 people) and the interest that the people showed for MariaDB.
We had many conversations with several kind of engineers, developers, providers and just for a few of them MariaDB was something new to discover; the great majority of them either were already using it or were planning to do it but they did not manage to find some “free” time to do it yet.
What impressed me, was that almost all of the MariaDB happy users just replaced their previous database server installation (MySQL or Percona) with MariaDB with the same approach they might have had for a standard database server upgrade: switch off, backup, install, switch on, done!
Well that’s great and we all know that MariaDB is a drop-in replacement for MySQL and with just this simple migration, all of them experienced a faster website.
What all of them clearly was not aware of is that they can achieve even better performances taking advantage of some of the MariaDB specific features.
We all know that a huge amount of work has been done in MariaDB around optimization since version 5.3 and the list of features is really impressing (check the Optimizer Feature Comparison Matrix) and I am sure the enthusiastic comments on how faster the Drupal websites were after installing MariaDB, have a strong part of their foundations on a mix of JOIN and SUBQUERY and Disk access optimization.
XtraDB Storage Engine
We all know how XtraDB has improved InnoDB and how it has been designed to better work in environments where high performance is crucial. It can scale much better on more cores architecture and use memory more efficiently.
XtraDB is the default MariaDB storage engine so new users will take advantage of it with no specific user action needed.
Sphinx Storage Engine
A lot of Drupal users are Sphinx users as well. I can easily say that no one I talked to was aware of MariaDB SphinxSE that allows users to do full text search with Sphinx search engine and still use the regular database.
This means that Drupal users (but not only them!!) can let Sphinx do what it has been designed to do at its best (indexing, searching, filtering, sorting) and JOIN with other MariaDB tables in a transparent way.
MySQL traditionally assigned a thread for every client connection; of course as the number of concurrent users grows this model shows performance drops. For what I know there are issues in using persistent connections with PDO, that is the interface for accessing databases in PHP and on the other hand if you have many concurrent clients or many users accessing a web site each of them will theoretically open one thread per connection and we know that many active threads are performance killers and it is sometimes not easy to properly set the max_connections limit.
MariaDB allows Drupal users to create a pool of threads that can be reused. Moreover the MariaDB pool is an adaptive pool, this means that it takes care of creating new threads in times of high demand and kill threads if they are idle.
Why are thread pools a good option for Drupal users? Because thread pools are most efficient when queries are relatively short and this often happens when users are reaching a website and the requested page does not hit the cache. So if you are a Drupal user with many concurrent clients or many users that potentially use a thread each simply turn thread pool on in your my.cnf.
Making UUID() and RAND() replication safe
MySQL's UUID() and RAND() functions both provide with (pseudo) indeterministic result. UUID()'s result is moreover bound to the host on which it executes. For this reason, both are unsafe to replicate with STATEMENT binlog format. As an example, consider:
master> create table test.uuid_test (id int, u varchar(64));
master> insert into test.uuid_test values (1, UUID());
Query OK, 1 row affected, 1 warning (0.03 sec)
master> select * from test.uuid_test;
| id | u |
| 1 | 7e3596d8-56ac-11e4-b284-3c970ea31ea8 |
The warning we got on the insert directly relates to the following inconsistency on a slave:
slave1> select * from test.uuid_test;
| id | u |
| 1 | 7e379d63-56ac-11e4-8477-3c970ea31ea8 |
The data on the slave is clearly inconsistent with the master's. The slave, replicating via STATEMENT binlog format, re-executes the INSERT command and gets a different UUID value.
One solution to the above is to generate the UUID value from your application. By the time MySQL gets the INSERT statement, the UUID value is a constant string, as far as MySQL is concerned.
However there's a way to do it from within MySQL, by decoupling the UUID() function from the INSERT statement. It takes a session variable. Consider:
master> set @safe_uuid := UUID();
Query OK, 0 rows affected (0.00 sec)
master> insert into test.uuid_test values (2, @safe_uuid);
Query OK, 1 row affected (0.02 sec)
master> select * from test.uuid_test;
| id | u |
| 1 | 7e3596d8-56ac-11e4-b284-3c970ea31ea8 |
| 2 | 29c51fb9-56ad-11e4-b284-3c970ea31ea8 |
And on a slave:
slave1> select * from test.uuid_test;
| id | u |
| 1 | 7e379d63-56ac-11e4-8477-3c970ea31ea8 |
| 2 | 29c51fb9-56ad-11e4-b284-3c970ea31ea8 |
The reason why this succeeds is that MySQL stores session variable values that are being used by DML queries in the binary log. It just so happened that @safe_uuid was assigned the UUID() value, but it could just as well have been assigned a constant or other computation. MySQL stored the resulting value into the binary log, where it is forces upon the slave to use. Check out this binary log snippet:
# at 14251
#141018 12:57:35 server id 1 end_log_pos 14319 Query thread_id=2 exec_time=0 error_code=0
# at 14319
#141018 12:57:35 server id 1 end_log_pos 14397 User_var
SET @`safe_uuid`:=_utf8 0x32396335316662392D353661642D313165342D623238342D336339373065613331656138 COLLATE `utf8_general_ci`/*!*/;
# at 14397
#141018 12:57:35 server id 1 end_log_pos 14509 Query thread_id=2 exec_time=0 error_code=0
insert into test.uuid_test values (2, @safe_uuid)
# at 14509
#141018 12:57:35 server id 1 end_log_pos 14536 Xid = 145
The same can be applied for RAND(). Funny thing about RAND() is that it is already taken care of by the binary log via SET @@RAND_SEED1, SET @@RAND_SEED2 statements (i.e. it works), though the documentation clearly states it is unsafe.
With Row Based Replication (RBR) the problem never arises in the first place since the binlog contains the values of the new/updated rows.
Improved Fault Diagnosis UI
In our efforts to improve MySQL monitoring, we recently enhanced our fault diagnosis UI. Adaptive Fault Detection has been an integral part of our suite, and we are excited for the UI updates that will help you better manage your databases.
The new release provides a more compact view, allowing you to quickly assess potential problems before they become bigger. Notice how a tiny, tiny server stall was caught by our algorithm. Fault detection has allowed us to get remarkable results from our weak EC2 boxes by keeping them running really cleanly.
We have also added more sections showing metrics such as top processes, network sockets, and network ports. These sections have quick-links to navigate to each of the various tools in the exact time range of the selected fault.
Further, you can click on the zoom buttons on sparklines to pop them out and make them big for closer inspection.
The new UI allows you to view metrics at a high level or drill down to the details, depending on your need. We’re always improving our app and appreciate feedback. Customer suggestions influence our product greatly.