<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GFMorris.org &#187; MySQL Wrangling</title>
	<atom:link href="http://gfmorris.org/categories/mysql-wrangling/feed/" rel="self" type="application/rss+xml" />
	<link>http://gfmorris.org</link>
	<description>Smart Guy, Dumb Code</description>
	<lastBuildDate>Sat, 17 Jul 2010 15:39:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>How I Seamlessly Merged Two Tasks Pro™ Databases</title>
		<link>http://gfmorris.org/archives/2007/08/05/how-i-seamlessly-merged-two-tasks-pro%e2%84%a2-databases/</link>
		<comments>http://gfmorris.org/archives/2007/08/05/how-i-seamlessly-merged-two-tasks-pro%e2%84%a2-databases/#comments</comments>
		<pubDate>Mon, 06 Aug 2007 04:15:20 +0000</pubDate>
		<dc:creator>Geof F. Morris</dc:creator>
				<category><![CDATA[MySQL Wrangling]]></category>

		<guid isPermaLink="false">http://gfmorris.org/archives/2007/08/05/how-i-seamlessly-merged-two-tasks-pro%e2%84%a2-databases/</guid>
		<description><![CDATA[I&#8217;ve long maintained two Tasks Pro databases. This made little sense&#8212;even though one of them ostensibly had lots of people helping me out in my hobby involved, I was the main one using it. I finally got to neglecting the hobby database, to lots of bad effects. BAD effects. So I decided to merge them. [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve long maintained two <a href="http://taskspro.com/">Tasks Pro</a> databases.  This made little sense&#8212;even though one of them ostensibly had lots of people helping me out in my hobby involved, I was the main one using it.  I finally got to neglecting the hobby database, to lots of bad effects.  BAD effects.  So I decided to merge them.  This took some planning and some execution, so I wrote it up as I did it as a tutorial for someone else crazy enough to make the same move.  <img src='http://gfmorris.org/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<hr />
<h4>Preparation</h4>
<p>First, I backed up both databases.  Makes sense, no?  Folks forget that, though.  So, <strong>back up your stuff, man</strong>.</p>
<p>Next, I went into the secondary source database&#8212;the one I&#8217;d be merging from, not the one I&#8217;d be merging into.  Here&#8217;s what I had to fix:</p>
<ul>
<li>Overlapping users.</li>
<li>Overlapping groups.</li>
<li>Overlapping tasks.</li>
</ul>
<p>The first two are fairly simple.  If the overlap isn&#8217;t going to cause an issue&#8212;if user #1 in both databases is the same, as it is in my case&#8212;just delete that user in the target db, but <strong>don&#8217;t</strong> delete anything else to do with that user.  For any users that overlap and aren&#8217;t the same person, you do need to change that user.  Assign them a new ID but otherwise leave them the same.  In my case, my friend Bryan was user #2 in my hobby db, but my work user [tied to my work email address, because there are a lot of tasks for work that I'll only do at work, so it makes sense to have the two users] was that same #2 in the personal db.  I looked for the next hole in the tp_users table to see where I could slot Bryan in, and that was #7.</p>
<h5>Fixing Users</h5>
<p>Here&#8217;s where all you have data that may need updating:</p>
<ol>
<li><b>tp_favorites</b>: user_id.  This was pretty simple: <code>UPDATE tp_favorites SET user_id = 7 WHERE user_id = 2;</code>.</li>
<li><b>tp_files</b>: added_by and modified_by.  Again, the SQL is pretty simple: <code>UPDATE tp_files SET added_by = 7 WHERE added_by = 2; UPDATE tp_files SET modified_by = 7 WHERE modified_by = 2;</code>.</li>
<li><b>tp_mailboxes</b>: I have no entries here at all.  The relevant fields appear to be task_creator, task_owner, creator, and modifier, but since I have no data for the table, I can&#8217;t confirm that without digging into the PHP.  <a href="http://alexking.org/">Alex</a> can feel free to correct me.  [I actually need to think about the mailboxes functionality: I've got to think that using it for incoming bug reports might be a good idea.]</li>
<li><b>tp_tasks</b>: creator and modifier.  I think you can guess what the SQL looks like: <code>UPDATE tp_tasks SET creator = 7 WHERE creator = 2; UPDATE tp_tasks SET modifier = 7 WHERE modifier = 2;</code>.</li>
<li><b>tp_templates</b>: creator and modifier come up yet again.  <code>UPDATE tp_templates SET creator = 7 WHERE creator = 2; UPDATE tp_templates SET modifier = 7 WHERE modifier = 2;</code>.</li>
<li><b>tp_user_groups</b>: user_id.  <code>UPDATE tp_user_groups SET user_id = 7 WHERE user_id = 2;</code>.</li>
</ol>
<h5>Fixing Groups</h5>
<p>As easy as fixing users was, fixing groups was even easier.  I had only two groups in the target database&#8212;one for work and one for my music stuff.  I have a lot more ideas for groups, but I&#8217;ve been holding off on those because I wanted to get the merger done first.  So that meant any group labeled #1 or #2 in the source db needed to be changed.  Predictably, I had both.  So group #1 in the source db became group #5, and group #2 became group #8.  Here&#8217;s how that got fixed with SQL:</p>
<ol>
<li><b>tp_task_groups</b>: <code>UPDATE tp_task_groups SET group_id = 5 WHERE group_id = 1; UPDATE tp_task_groups SET group_id = 8 WHERE group_id = 2;</code>.</li>
<li><b>tp_user_groups</b>: <code>UPDATE tp_user_groups SET group_id = 5 WHERE group_id = 1; UPDATE tp_user_groups SET group_id = 8 WHERE group_id = 2;</code>.</li>
</ol>
<p>See, I told you that was easy.</p>
<h5>Fixing Tasks</h5>
<p><em>Here</em> is the hard one.  You&#8217;re merging two databases, and chances are that the main data&#8212;task IDs&#8212;are gonna be the same.  :sigh:  Fear not!  The easy solution is pretty simple.</p>
<ol>
<li>Look in the target db [where you're merging into] and find its largest task ID.  In my case, the next autoindex was 43,621.  This told me an easy, easy, easy fix: increment <em>every</em> tasks in the source database by 50,000.  This guaranteed that all tasks in the source db would be 50,000+, while all tasks in the target db would be &lt;50,000.  [This presumes that you could have a user working in the target db at the time.  If I'd been very close to 50,000, I might have gone to 55,000 or something---well, I wouldn't have had to, because in the target db, I own both users.  But this is how you might have to do it.]</li>
<li><a href="http://dev.mysql.com/doc/refman/4.1/en/update.html">The MySQL documentation for the UPDATE statement</a> gives this example: <code>UPDATE t SET id = id + 1 ORDER BY id DESC;</code>.  That&#8217;s smart code execution: it does the update to the largest ID first, which prevents situations where your offsets would overlap.  Consider a situation where you&#8217;re using an offset of 1000 but have 4,000+ tasks in the source db.  When you update task 1,732, it&#8217;ll become 2.732&#8212;and could mean that two tasks have the same ID!  <em>Bad juju, man!</em>.</li>
</ol>
<p>So let&#8217;s look at what has to get changed, and what SQL does it, shall we?</p>
<ol>
<li><b>tp_tasks</b>: <code>UPDATE tp_tasks SET id = id + 50000 ORDER BY id DESC; UPDATE tp_tasks SET parent = parent + 50000 ORDER BY parent DESC WHERE parent > 0; UPDATE tp_tasks SET template = template + 50000 ORDER BY template DESC; UPDATE tp_tasks SET recur_source = recur_source + 50000 ORDER BY recur_source DESC;</code>.  It is okay to take a deep breath before kicking this one off and exhaling when it&#8217;s done.  Take a note of how many rows there are in tp_tasks before you start; you should get the same number of affected rows for the first two, but the template and recur_source are going to be far smaller because those are used less often.  If you recur a lot of tasks&#8212;and I do!&#8212;you&#8217;ll see a lot of rows affected.</li>
<li><b>tp_favorites</b>: <code>UPDATE tp_favorites SET task_id = task_id + 50000 ORDER BY task_id DESC;</code>.  This is an easy one because it&#8217;s just at two-column table.  Note that, at this point, you&#8217;ve edited both columns with UPDATEs, but having done so separately.  You could, if you wanted, combine both the statements into one query setup if you want.  I&#8217;ve shown it this way here because of how I&#8217;ve done this, and because I&#8217;ve written this tutorial as I&#8217;ve done the updates.</li>
<li><b>tp_files</b>: <code>UPDATE tp_files SET task_id = task_id + 50000 ORDER BY task_id DESC; UPDATE tp_files SET id = id + 50000 ORDER BY id DESC;</code>.  As with tp_templates below, you&#8217;ve got an id here that you also need to offset.  I&#8217;ve chosen to offset that by the 50000 number because, well, it&#8217;s easy.</li>
<li><b>tp_task_groups</b>: <code>UPDATE tp_task_groups SET task_id = task_id + 50000 ORDER BY task_id DESC;</code>.</li>
<li><b>tp_templates</b>: <code>UPDATE tp_templates SET task_id = task_id + 50000 ORDER BY task_id DESC; UPDATE tp_templates SET id = id + 50000 ORDER BY id DESC;</code>.  Now, you may have balked at my tp_tasks statement where I changed the template line, and you may be balking here that I&#8217;m changing the id of the templates table.  Does this have to happen by the $id_offset?  No, not really.  Admittedly, you could look up how many templates are in the target db and offset it by that amount.  That said, <em>it&#8217;s just a number</em>.  As long as you&#8217;re consistent, you&#8217;re okay.</li>
</ol>
<h4>Merger</h4>
<p>Now you&#8217;ve prepped the source database for merging.  If you do a SQL dump of that data at this point, everything you have is guaranteed to not overwrite anything in the target database.  <strong>Still, you better have backed up the data, dammit.</strong>  Why?  Simple: you could have made a mistake somewhere.  You could have overlooked something.  <strong>Backups mean never having to say that you&#8217;re sorry.</strong></p>
<p>Here&#8217;s how I did the merge:</p>
<ul>
<li><b>tp_config</b>: I ignored it.  This is a task, user, and group-independent table.  There is no sense in editing any data in this table at all.</li>
<li><b>tp_favorites</b>: I ran the MySQL dump on the table, pared the data down to only the INSERT statements.  I had a handful of favorites, so I did this all in phpMyAdmin&#8217;s browsers, never downloading an SQL file locally.</li>
<li><b>tp_files</b>: I dumped from MySQL and imported via phpMyAdmin, but with a lot of files, <em>this will be problematic</em>.  If that table is large for you, use mysqldump and mysql -u[user] -p[password] [database] < tp_files.sql on the command line.  Save yourself the frustration.</li>
</li>
<li><b>tp_groups</b>: This was done as tp_favorites was.</li>
<li><b>tp_tasks</b>: This was done as tp_files was.  Again, doing the insert via the command line might be the way to go.</li>
<li><b>tp_task_groups</b>: This was done as tp_files was because of the size.</li>
<li><b>tp_users</b>: This was done as tp_favorites was.</li>
<li><b>tp_user_groups</b>: This was done as tp_favorites was.</li>
</ul>
<p>Hopefully that will guide you as you progress.  If you need advice or have questions, leave a comment.</p>
<img src="http://gfmorris.org/wordpress/?ak_action=api_record_view&id=26&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://gfmorris.org/archives/2007/08/05/how-i-seamlessly-merged-two-tasks-pro%e2%84%a2-databases/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
