How to Refresh a Non-Prod MySQL Database
Hey guys just a quick tip I wanted to toss up there about refreshing Prod to Non-Prod Databases. Assuming that you've setup a system for pulling your Personal Information out of your production database (ideally through some sort of data masking script or via an update/insert trigger) you shouldn't have too many issues refreshing a database from prod to non-prod(s) (How I Assume Your Architecture Looks)
Keep in mind that this method is highly generalized. And has some performance issues on large or highly trafficked databases. If you find yourself with a database like that I would suggest not using this method. Instead you should look into Percona's XtraBackup tool. You can take a snapshot of your database and then transfer that off of your "hot" box into your non-prod for rebuilding. I haven't had a use case where I've needed this yet. But It does come highly recommended from everyone I hear talk about it.
So on to the actual refresh. I should take this moment to absolutely insist that you only do this with INNODB or XtraDB this will lock up MyISAM to the point that your effectively taking your production environment down. So in this case were using the basic mysqldump command. Assuming you have one prod and 3 non prods (np1 & np2) and your database is named mwguys_sample_db this is what your command should look like:
# On prod # Be sure to use a screen or a nohup for large databases mysqldump -u admin_user --password=REDACTED --triggers --routines --events --add-drop-database --single-transaction --database mwguys_sample_db | tee >(mysql -h np1 -u admin_user --password=REDACTED mwguys_sample_db) >(mysql -h np2 -u chalbersma --password=REDACTED mwguys_sample_db) | less +F
So let's break this down. Obviously I'm using the mysqldump command to do most of the heavy lifting. I linked to it above I'm linking to it here. It's just all around good stuff. In this case I'm connecting directly from a shell on prod my primary production database. I'm passing the connection information for the admin_user@localhost user directly on the command line (For those of you worrying about snoops mysqldump will replace the --password=REDACTED line with something that looks like --password=x xxxxx in the process list).
The --triggers, --routines and --events flags are described in the manual page. But essentially this will get your stored procedures, triggers and events to copy over with your database. Keep in mind that if you use these heavily and have invested in unique definers for any of these items you could run into problems. I suggest having a user for items like these whose name stays the same across the environments (E.G. trigger_0_db1@localhost for your first trigger and that user only has access to do what db1.trigger_0 stored procedure says it should do. However this would take time to maintain.
The --add-drop-databases bit will drop the database in the target. Needless to say this is a dangerous item you should definitely test this command before you use it. This flag alone ensures that you should always make sure you have a recent backup of your data. Just in case you toss this to the wrong place.
The next flag, --single-transaction is probably the best flag of the bunch. It utilizes the transactional capabilities of MariaDB 5.3 and up to give you a consistent backup. Which is useful for a number of things (And in theory utilizing the --master-data flag you could setup live replication to your non-prods).
Finally the --database flag specifies which database/schema to copy to the non-prods. In my example I only have the one database. In theory you could go down farther and replicate just a table or utilize the --ignore-table=tablename to ignore a particular table. And of course you can use the all powerful --all-databases to grab everything. In our example we're just grabbing one database.
watch ps -ef \| grep -i mysqldump
A note on events. Should your schema have a large number of events (for say masking data) you may find that the mysqldump takes a particularly long time as it will rebuild the events before inserting the data. This means that the event will trigger for every update. You may want to consider dumping your schema without your events and doing just your data transfers. Then adding your events in afterwards after you've manually done the items your events do.
Data masking in non-prods is a tough thing to discuss. Ideally you'd develop a consistent set of test data that you can generate on the fly to test your application which would negate the need to "refresh" your non-prod databases. However if you had a consistent set of test data you wouldn't be here (or your here because you want to replicate your full database for performance testing).