Quantcast
Channel: Cranky Bit » Subversion
Viewing all articles
Browse latest Browse all 8

Separating a Large Repository

$
0
0

A few months ago, I posted an article about combining multiple Subversion repositories into one large repository. Some folks have expressed an interest in doing the opposite--separating one large repository into multiple smaller repositories. The process is not without its quirks, but it can be done.

At first glance, you'd conclude the process would work much the same way: Loop through the individual directories in the large repository, create smaller repositories for each one, then dump and import the contents of each directory into its small repository.

The tricky part is that the Subversion dump command dumps everything in the repository, by revision. In order to pull just a single directory, you must filter a complete dump with the "svndumpfilter" command. This blog post by AllMyBrain.com basically explains how to accomplish this in Linux. I usually have to work on a Windows box on the job, so I wrote up a script to accomplish this in a Windows batch script.

The strategy is the same as the Linux script, though. We're going to use "svnadmin dump" the large repository, then use "svndumpfilter" to filter by just the directory we want, then "svnadmin load" the results into the newly created repository. All of this can be combined into a single statement via piping:

DOS:
  1. svnadmin dump c:\my\large\repo\ |
  2. svndumpfilter include MyDirectory |
  3. svnadmin load MySmallRepo\MyDirectory

This will make a little more sense when we look at the full script. Let's just put it out there and then go through it.

DOS:
  1. SET SmallRepoPath=c:\SmallRepos
  2. SET PathToRepo=c:\BigRepo
  3. SET UNCToRepo=file:///c:/BigRepo
  4. SET PathToChkout=c:\BigRepoChkout
  5.  
  6. mkdir %PathToChkout%
  7. svn co %uncToRepo% %PathToChkout% --ignore-externals
  8. dir /A:D /B %PathToChkout%> %PathToChkout%\dirs.tmp
  9. for /F %%i in (%PathToChkout%\dirs.tmp) do (
  10.     if not %%i==.svn (
  11.         echo Processing "%%i"...
  12.         mkdir %SmallRepoPath%\%%i
  13.         svnadmin create %SmallRepoPath%\%%i
  14.         svnadmin dump %PathToRepo% | svndumpfilter include %%i | svnadmin load %SmallRepoPath%\%%i
  15.     )
  16. )
  17. del %PathToChkout%\dirs.tmp
  18. rmdir /S /Q %PathToChkout%

First, we're setting our paths. "SmallRepoPath" will be the directory holding all of the small repositories we'll be creating. "PathToRepo" and "UNCToRepo" point to the big repository as DOS and UNC paths, respectively. "PathToChkout" points to a Subversion checkout of the large repository.

First, we check out the large repository with the "svn co" command. We do this just so that we can call the "dir /A:D /B" command, which says, "List just the directories in the checkout directory." We use that output to loop through each directory in the large repository.

Then, for each directory in the large repository, we create a corresponding small repository, then do our dump/filter/load combo. Again, we're dumping the contents of the large repository, using "svndumpfilter" to filter by directory, then loading that filtered dump into the new small repository.

Finally, we just do some cleanup by removing our temp files and the checkout directory.

There are a few caveats with this code.

First, it will import all of the large repository's revisions into the smaller repository. There are svndumpfilter arguments to prevent this, such as --drop-empty-revs and --renumber-revs, but I found the Windows Subversion binaries to be problematic with these arguments. The end result is that you have more revision numbers than needed, but only the relevant data is actually imported into the repository, and viewing logs on just the imported directory will still obviously show revision logs related to that directory, so there's really little harm done.

Second, the dump/filter/load action doesn't always work on a directory that has been moved (copied/deleted) from another location within the large repository. What's worse, it won't fail, it just won't load any data into the small repository. To address this, use the --revision argument on the "svnadmin dump" command to do a dump starting at a revision after the move took place. Doing so will give the "svndumpfilter" command something it can work with.

This process is certainly more complicated to explain, but ultimately there's not that much more going on. Hopefully this explanation is helpful to you.


Viewing all articles
Browse latest Browse all 8

Trending Articles