MySQL Failover, Enhanced MySQL Utilities

By | August 9, 2016

This blog is a 2nd part of a multi-part series on areas of  failover for MySQL.  The first installment looked at design considerations, giving us a “thinking” perspective on what we might want to adopt.  Later I will take a look at more of a business and operational way of thinking through these details.  In this installment, I’d like to “build” something.  Let’s do something with that initial thinking and put it to work.

MySQL Utilities 1.6.4+ Latest GA tools

The MySQL Utilities are a collection of tools aimed at making various tasks simpler.  They were originally introduced as a component of MySQL Workbench, but later were also offered separately as well.  This allows us to install the utilities on servers in proximity to the systems we need them to work on.  My intention here is to focus on the HA utilities within the collection, but we may show some others for demonstrating their general use.

Using Login-Paths for for connecting to servers

 
A new way of working with our MySQL Server has evolved, rather quietly I might add.  There is a method of storing credentials in encrypted files and never having passwords exposed in plain text.  They can also be used with these MySQL Utilities.  The documentation on login-paths with the utilities is pretty straight forward, but I’ll discuss it here too.

Let’s say I have a CentOS-6 Linux server with MySQL binaries installed.  I’ll be running 3 MySQL instances on this server and will start and stop them using the mysqld_multi client.  The MySQL Utilities standard command line usage for providing details to connect to a server include:
<username>[:<password>]@<host>[:<port>][:<socket>]  ….which isn’t elegant needing to put all that information on the command line….nonetheless a password.

In turn, the login-path approach reduces all of this just to a name that refers to these settings.  Here is a comparison of the same command.  See if you can tell which one is using the login-path reference:

So put your hand up if you want to put your password on the command-line.  Boooo!  Ok…yes, its not a “DRAMA”, but shouldn’t be done in a production environment, that’s for sure.  It can be ok (maybe) for quick and dirty testing in an isolated test environment…or when there is no other way and being operational has to happen.  Anyhow, that’s my opinion.  The login path setups give us a good way of working with these tools.  So we’ll use them from here on in.  I’ll show you how I created mine so that you can follow along on this blog.

So we’ve got that down now…lets move on.

Requirements for using the HA Utilities

The MySQL High Availability Utilities work with certain setups and configurations in place on your server.  They are:

  • The admin account that operates on your master and slave servers need to have enough privileges to handle all the tasks that they “may” be required to do.  For now, using an admin account equivalent to a root account is needed.  As its inadvisable to make an actual ‘root’ account accessible remotely, it’s best to create an account like it for the purposes of these HA management tasks that can access your MySQL Servers from a remote server location. The grant that I am mentioning here is:
    GRANT ALL ON *.* TO 'user_name_here'@'%' WITH GRANT OPTION;

    • This is where the importance of using those encrypted login-path credentials comes in handy.
  • The database account that is used for replication should also be known, and setup as a login path account as well.  There are situations when we need to provide it to the utilities.
    • GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
    • Remember: On MySQL 5.7 you’ll need to provide a separate CREATE USER… statement prior to executing the GRANT statement.
  • The mysql-python-connector needs to be installed for the utilities to work.  It should also be installed first to ensure the utilities (which should be installed second) have the proper components in place during their own install.
  • There are a bunch of database configurations we’ll need as well and are noted here.
    • GTIDs are generally required for these utilities to use.  GTIDs use a blend of a UUID hash followed by plain numbers.  Although the hash values look ridiculous their whole usage is much better than the old ways where we had to identify a binary log file by its name along with the position within that file and then try to tell from that how our system was doing.  Imagine for a moment how in sync your various replication slaves might be with your master, or otherwise having to troubleshooting issues.  GTIDs enforce a new level of strict behaviour, which at first may seem like they are problematic.  But they’re just more precise with their replication handling.  It’s easier to do things wrong using the old way….and its easier to do things right with GTIDs.  Embrace it!…..’everybody is doing it’ (I’m a teenager again…or just overly tired)….sigh
      • GTIDs look like this:
        Executed_Gtid_Set: 8290b421-5b85-11e6-845a-08002725ec7d:1-25
    • The configuration entries need for this are:
      • gtid_mode=ON
      • enforce-gtid-consistency
    • The ‘report’ configurations, although generally useful anyhow, must be set to actual resolvable values as they are used by the utilities.  The master-info configuration is also critical for the utilities to work.
      • report-host=my1.example-domain.net
      • report-port=3301
      • master-info-repository=TABLE
    • Standard replication settings such as:
      • log-slave-updates
      • log-bin=some-clever-name
      • server-id=1    #should be a unique id greter than zero across your replication setup

Building out your MySQL instances and preparing them for Replication

Ok…so we know some of the basics….here are some more basics.  Isn’t it great there are blogs out there to help with all these basics 🙂

You’ve installed your databases with your core configuration file in place, with items as noted above.  Maybe you’ve even reset your root password as is needed and created a few accounts.  Did you do all this on each server?

Either way, short of changing your root password on each server, you could stop there.  The point is this, until you start replication “anything” you CREATE, INSERT, ALTER,…DML or DDL on any of your servers needs to be done on each server until you start replication.

Once you’ve stopped yourself, let’s say after updating the root account on each server, then its time to get replication going.

  1. Run this command on each server with the root account:   RESET MASTER;    …it will set our GTID values back to the beginning
  2. Then, execute the CREATE USER... and GRANT REPLICATION SLAVE... statements for your replication user account ONLY…and ONLY on the MASTER.
  3. Then you can start replication using the credentials for replication user account that you just put in place. These statements are executed from the slaves that need to connect to the master.
    • Either: run the following statement on each slave server
      • CHANGE MASTER TO MASTER_HOST='my1.example-domain.net', MASTER_PORT=3301, MASTER_USER='repl', MASTER_PASSWORD='repl', MASTER_AUTO_POSITION=1;
      • START SLAVE;
    • Or: run the following Utilities commands from the command line:
      • mysqlreplicate --master=my1 --slave=my2 --rpl-user=rpluser
      • mysqlreplicate --master=my1 --slave=my3 --rpl-user=rpluser
    • You should now notice that the replication account you created on the master has been copied to each of the slaves.

Full output of running these commands:

Introducing the HA Utilities!

So we’ve seen how the mysqlreplicate command can work.  From here we begin to explore even more basics….but this time some basics on the other “High Availability oriented Utility Commands”, and the results of running them. The ones that I will plan to look at are listed below with examples for some of them.  I’m saving the best for last.  The mighty “mysqlfailover” command.  You’ll like it…it’s worth waiting for. 😛

mysqlreplicate – Connects slaves to a master, as we have seen already.  There are some other options worth looking at on the documentation page.

mysqlrplcheck – check replication topology and runs various tests.  Below an execution of it.

mysqlrplshow – show slaves attached to the master and even a recursive check to see if slaves have other slaves attached to them.  Provides a complete topology listing as a graph or a list.  Example as follows:

Replication Administration and Failover Utilities

Next, I’d like to take more time to look at the following 2 commands: mysqlrpladmin and mysqlfailover.

mysqlrpladmin

The mysqlrpladmin command is much like mysqlfailover but focuses more on managed events that are manually initiated, but not exclusively so.  It has specific handling for “healthy” failovers…where there is no problem with the current master.  Maybe it needs to come out of service for maintanance and needs to be replaced.  Maybe it needs to be removed from the replication setup altogether.  Failovers for failed events are also a capability.  Health checks are a command it provides too.  I’d recommend a good read of its documentation page and samples.  My samples below should be helpful as well.

Here is an execution of the script that changes a slave to be a master and demotes the master to be slave.  I’ll also capture a log of the activity as you can see in the command:

Well that’s pretty slick.  As you can see, it produces the “health” command’s output at the end and shows how all the slaves are all referencing the new master.  I’ve also displayed the contents in the log and it does provide more insight into the operation.

The logging functionality of these commands is pretty important for operational production environments.  They help provide evidence on the progression of a utility. Whether it is a database load utility, a reorg utility or in this case a replication management utility.  Very important!

Now let’s try a switchover again, but with some pre and post scripting added in.  Going into this change, I also put the slaves into read_only=ON.  The scripting added here will make the old master read_only=ON and the new master read_only=OFF.  Also, the old master is expected to have dropped from the replication topology as described in the documentation when demote-master isn’t used.

Perfect execution.  As you can see, the notes in the log regarding the extra scripts that I had inserted into the flow of the failover process.  The simplicity of setting read_only=ON and turning it off.  Actually, I added more in it than was needed by turning off binary logging for those executions, but that prepares me for later.

Normally, I would also include additional scripting that might update the my.cnf configuration file for the server so that changes would persist after a system restart.  However, not for this demonstration.

Now I’ll add database instance “my2” back into replication setup, run a quick ‘health’ status and we’ll move onto the next utility.

mysqlfailover – Automatic Replication Health Monitoring and Failover

The mysqlfailover utility is really the tool for automated handling of failover events.  As noted in the documentation, it works exclusively with GTID based replication and its primary job is the monitoring of a master and when a failure occurs to failover to a slave that s in a healthy state.

The health check polling that is done on the master defaults to checks that are every 15 seconds (and can be a minimum of 5 second intervals).  The health check itself is done via a “ping” to the database, 3 pings per check interval…with a configurable delay defaulting to 3 seconds. The utility also does a check with the python connector to see if the server is still reachable.  The delay is to provide enough time for the ping response to come back, but also 3 failed pings indicate a failing system….so the time between helps to eliminate false-positive situations.  This health check polling is just as discussed in the previous blog in this series.

There is also an option to further extend a subsequent health check in the case that the regular check considers there to be an error.  If the system considers an error is imminent than this extended time is added before a failover is executed. This is additionally intended to prevent false positive scenarios by extending the wait time to qualify that the system is down.

We can determine if this check is valid enough for us, or we can use an option –exec-fail-check which allows us to add a script providing our own health check.  Our dedicated script would execute every time on the defined interval we’ve configured.  If this is used, then the extended health check option is not available.

So, as in the last blog there was a good chunk of consideration into the health check done on one’s system.  Here our failover tool provides a health check to automate it handling a MySQL failover….and an F5 load balancer may have its own healthcheck too for controlling the application connections to the database.  Maybe there is room for coordination here.  Maybe the coordination occurs during the –exec-before stage of the MySQL Utilities failover or of a maintenance switchover.  The scripting within the –exec-before or –exec-after hooks could executes a change that informs the F5 of a pending failure causing it to also follow suite by failing over the application tier that it manages.  If coordination like this is sought out, then the mysqlfailover utility assists along with its 4th scripting hook, the –exec-post-failover option.  From there, an update done on the new database could also inform the F5 which system is the new master.

The key is to keep the design clear and simple, even if it is complex overall. Each step is simple and well defined.  Consider what the actual needs are in the big picture of the design and then let it come together naturally.  It needs to be a requirements driven design for it to solve what it is seeking to do.

Additional Failover Utility Scripting Goodies

In the mysqlfailover documentation there is a table 5.3 “External Script Parameters“.  This wonderful chart, hidden deep within the documentation of this invaluable utility contains paramaters that are passed to your scripts if you use the scripting interfaces or hooks.

What is even better, is that the mysqlrpladmin also passes these to scripts that use its –exec-before and –exec-after configurations.

So let’s take a look at executing this mysqlfailover script, but not as a daemon.  Just running in a terminal.  Then from another terminal I’ll stop the database instance that is the master, causing the failover.

The two screens that I captured loooked as follows….

Then a failure occurs and this is printed to the terminal

…and back to the console

And finally the logs from that failover execution. Also, since I had the –exec-* shell scripts writing to the same log file, you can see the points at which those scripts ran.

 

Final Stages

So we’ve got a good grasp on our failover tools, how they allow us to integrate some scripting.  The variety of ways that we can use them to manage our replication environment. When it comes to scripting our failover….what are the big things that we are looking to safe guard now.

  • Ensuring that slaves always have read_only=ON both in the database and in their server’s my.cnf file.
  • On the Master that is failing over, prevent users/application from being able to write to it.  Ideally scripting here would be under the –exec-before.  This step is pretty important as it will ensure no data is orphaned and left on the old master.
    • Inform load balancer MASTER is no longer available!
      • Is there a way to persist this state through a database restart? –init-file= usage.
    • Other scripting in the database:
      • SET GLOBAL offline_mode=ON;
      • A process to kill out application or user connections that remain
      • Set the system to be using read_only=ON;…plus change config file
  • On the new master, some form of scripting to:
    • disable read_only=OFF and persist the change in the database config file.
    • Inform a load balancer that the instance is ready to receive read-write workload.

There are many ways to build out these solutions.  And certainly there are numerous ways to solve these problems so that the business can run knowing that they’re investing correctly in their software and hardware infrastructure.

I hope this blog serves to show some bare bones ideas that we thought about in the previous blog and implemented some of those ideas here…through scripting and utility usage examples.

Once again…its great to get feedback!

Thanks for reading.