====== Complete ====== The SVN migration was completed in July 2009. This document has been retained for historical purposes. ====== CVS to SVN Migration Path ====== This is a document describing in detail the steps I (Gwynne) took to convert the PHP repository from [[http://ximbiot.com/cvs/wiki/|CVS]] to [[http://subversion.tigris.org|SVN]], updated continuously as I go through the process. I am making use of the CVS2SVN command-line Python tool at [[http://cvs2svn.tigris.org/]]. The basic documentation is at [[http://cvs2svn.tigris.org/cvs2svn.html]]. ===== Basic cvs2svn Use ===== I started from the very beginning with a copy of the entire [[http://cvs.php.net/|PHP CVS repository]] as tarballed by Derick. CVS2SVN runs in a large number (16) of effective "passes" over a CVS repository, enumerated here: $ ./cvs2svn --help-passes PASSES: 1 : CollectRevsPass 2 : CleanMetadataPass 3 : CollateSymbolsPass 4 : FilterSymbolsPass 5 : SortRevisionSummaryPass 6 : SortSymbolSummaryPass 7 : InitializeChangesetsPass 8 : BreakRevisionChangesetCyclesPass 9 : RevisionTopologicalSortPass 10 : BreakSymbolChangesetCyclesPass 11 : BreakAllChangesetCyclesPass 12 : TopologicalSortPass 13 : CreateRevsPass 14 : SortSymbolsPass 15 : IndexSymbolsPass 16 : OutputPass $ I have the CVS repository stored in a directory called "realroot" (why not?) and stored the temporary data files on a separate mount point because of drive space issues. My initial commandline, based purely on the documentation, before any kind of testing was this: ./cvs2svn --svnrepos=./svnroot --fs-type=fsfs \ --dry-run --no-cross-branch-commits \ --username=svnconvert \ --cvs-revnums --use-cvs \ --tempdir=/Volumes/External/private/tmp/cvs2svn-tmp ./realroot The options are: * "--svnrepos=./svnroot" - the place to create a new SVN repository * "--fs-type=fsfs" - The type of SVN repository to create. FSFS is the default, but I specified it anyway * "--dry-run" - Test only. Don't actually convert anything. I always do this first. * "--no-cross-branch-commits" - Don't make single commits across multiple CVS branches. Makes history a little more consistent at the cost of more revision commits * "--username=svnconvert" - The SVN username to use in log messages when creating the new repository * "--cvs-revnums" - Store the last CVS revision of each file in a property on that file, I thought it'd be useful for history tracking * "--use-cvs" - Use the cvs command instead of internal code or the rcs command to retrieve information from the CVS repository. Slower but more reliable. * "--tempdir=/Volumes/External/private/tmp/cvs2svn-tmp" - Put temporary files in my external hard drive's tmp directory. Saved me from having my system hard disk fill up; one of the warnings for "--use-internal-co" (see below) is that it requires considerable disk space. * "./realroot" - The path to the CVS repository to convert. ===== Pass 1 ===== Almost instantly I thought of something - why was I using the much slower --use-cvs, since the only thing it affects is the $Log$ keyword? Derick and Jani confirmed that $Log$ is not, in fact, used in the PHP CVS, so I switched to "--use-internal-co" to use internal code. The resulting speedup of cvs2svn was considerable. My next issue was with the sheer amount of output cvs2svn spits out, and there was a **lot**. I added "--quiet" to the commandline to slow down the flooding of my terminal window. By no means did it stop it, but it slowed it. Pass 1 ran all the way through after that, and died with a number of errors saying things like: A CVS repository cannot contain both realroot/phpdoc-ja/reference/oci8/functions/OCI-Lob-writeToFile.xml,v and realroot/phpdoc-ja/reference/oci8/functions/Attic/OCI-Lob-writeToFile.xml,v; I looked this up in the cvs2svn FAQ and found that it's a common minor corruption in CVS repositories due to various forms of repository maitenance. There are several ways to handle this issue, some of which preserve history and some of which destroy it. I chose the most conservative: adding "--retain-conflicting-attic-files" to my commandline. The result is a few extra files in the new SVN repository, but it preserves the maximum amount of history data. I consider that one of my most primary concerns during this conversion process. And also, I got a very upsetting error: ERROR: 'realroot/phpweb/distributions/Attic/php-5.0.0-installer.exe,v' is not a valid ,v file This seemed odd to me, so I popped the file open in [[http://macromates.com/|TextMate]], my preferred text editor and found out that about half the file had been sliced neatly off the top. That's definitely **not** a valid ,v file. I checked out whether it was worth trying to repair the file, but Jani didn't think so, since it's in Attic, and I agreed. I moved the file out of the CVS root, and the error disappeared. I didn't delete the file, though, just in case. When pass 1 succeeded, my full commandline was: ./cvs2svn --svnrepos=./svnroot --fs-type=fsfs --dry-run --no-cross-branch-commits --username=svnconvert --cvs-revnums --use-internal-co --quiet --retain-conflicting-attic-files --tempdir=/Volumes/External/private/tmp/cvs2svn-tmp ./realroot ===== Pass 2 ===== Pass 2 immediately spit out dozens of errors regarding the inability to decode CVS log messages. I wasn't entirely surprised, as some of the committers to CVS don't do it in English, but it did seem a little odd that so many files were failing. I investigated by looking at the documentation for encodings, and found the "--encoding" and "--fallback-encoding" options. Passing multiple "--encoding" options would try each encoding in sequence until one succeeded. If none succeeded, the "--fallback-encoding" would be used in lossy mode. I thought, "Cool! Now, how do I tell it to try **all** encodings?" There turned out to be no way to do that, and the list of single encodings was very, very daunting. There are hundreds of encodings out there. Then I noticed something: the default encoding list is "ascii". Nothing else. No UTF-8, no Latin 1, nothing! That wouldn't do, so I added several "--encoding" options for ASCII, UTF-8, UTF-16, Shift-JIS, MacRoman, ISO Latin 1, and Euc JP. Those struck me as being all the common encodings, and lo and behold, pass 2 spit out no more complaints. My guess was that Latin 1 and UTF-8 covered almost all of the issues. I had also added "--fallback-encoding=latin_1" as a "let's fall back if we have to" measure, but removed it, worried that it was suppressing errors I'd rather have seen. I needn't have worried; even without that option, it worked great. When pass 2 succeeded, my full commandline was: ./cvs2svn --svnrepos=./svnroot --fs-type=fsfs --dry-run --no-cross-branch-commits --username=svnconvert --cvs-revnums --use-internal-co --quiet --retain-conflicting-attic-files --encoding=ascii --encoding=utf_8 --encoding=utf_16 --encoding=shift_jis --encoding=mac_roman --encoding=latin_1 --encoding=euc_jp --tempdir=/Volumes/External/private/tmp/cvs2svn-tmp ./realroot ===== Pass 3 ===== Pass 3 died pretty much instantly with the very cryptic message: ----- pass 3 (CollateSymbolsPass) ----- Checking for forced tags with commits... The following paths are not disjoint: Path tags/php4 contains the following other paths: tags/php4/CREDITS Please fix the above errors and restart CollateSymbolsPass The first thing I did was a documentation search. Nothing in the docs or the FAQs. Next came a mailing list archive search. Nothing but an error that wasn't really related. I tried Googling the entire Web, but that just gave me a bunch of irrelevant results and a link back to the same mailing list article I'd already found. I found the code that outputs the message in cvs2svn, but that wasn't any use because the text "not disjoint" is nowhere near the code that actually tells me what's going on, especially since I don't read Python! Finally I decided to try a little simple logic. There was no literal path "php4" in the repository, but after all, the error message said "tags/", didn't it? So I hopped over to [[http://cvs.php.net/]] and checked the php4 modules. Lo and behold, there was a php4 tag, and a php4/CREDITS tag! Next question was, why in the world did cvs2svn have an issue with that? Well, because tags are directories in SVN, you can't have a file by the name of a tag that way. The tag itself shouldn't exist, but it does, and I had to handle it. I can't really expect to do a cvs tag -D on the repository, so how to tell cvs2svn not to bother with that one useless tag? The answer: the "--exclude" option. I added "--exclude=php4/CREDITS" to the commandline and tried pass 3 again. It worked perfectly. When pass 3 succeeded, my commandline was: ./cvs2svn --svnrepos=./svnroot --fs-type=fsfs --dry-run --no-cross-branch-commits --username=svnconvert --cvs-revnums --use-internal-co --quiet --retain-conflicting-attic-files --encoding=ascii --encoding=utf_8 --encoding=utf_16 --encoding=shift_jis --encoding=mac_roman --encoding=latin_1 --encoding=euc_jp --exclude=php4/CREDITS --tempdir=/Volumes/External/private/tmp/cvs2svn-tmp ./realroot ===== Passes 4-8 ===== Pass 4 worked on the first try: Pass 5 was also a clean sweep: Pass 6 was just a beautiful thing, went by without a hitch. Not that I expected any of these "sort" passes to be a big deal, but you never know... Pass 7 worked without problems too, though it was a much slower pass than the last three. I finally started to feel like I was making progress. Pass 7 done, means pass 8 is up! Halfway there! Pass 8 took something along the lines of an hour to run, but it finally finished without errors... I hope the other phases aren't similarly insane with their timing. I'm considering taking the --quiet flag back out, and I think I will for pass 9. It's nice to know //something's// happening; I checked my top -u output twice during pass 8 to make sure it hadn't frozen up. ----- pass 4 (FilterSymbolsPass) ----- Filtering out excluded symbols and summarizing items... Done ---- pass 5 (SortRevisionSummaryPass) ----- Sorting CVS revision summaries... Done When passes 4-8 succeeded, my commandline was: ./cvs2svn --svnrepos=./svnroot --fs-type=fsfs --dry-run --no-cross-branch-commits --username=svnconvert --cvs-revnums --use-internal-co --quiet --retain-conflicting-attic-files --encoding=ascii --encoding=utf_8 --encoding=utf_16 --encoding=shift_jis --encoding=mac_roman --encoding=latin_1 --encoding=euc_jp --exclude=php4/CREDITS --tempdir=/Volumes/External/private/tmp/cvs2svn-tmp ./realroot ===== Pass 9 ===== Woohoo! Taking --quiet out allowed me to get timing data from the script for each pass, and wow was 9 ever a long pass! Have a look: ----- pass 9 (RevisionTopologicalSortPass) ----- Generating CVSRevisions in commit order... Done Time for pass9 (RevisionTopologicalSortPass): 4088 seconds. cvs2svn Statistics: ------------------ Total CVS Files: 159442 Total CVS Revisions: 912073 Total CVS Branches: 160368 Total CVS Tags: 1829773 Total Unique Tags: 1705 Total Unique Branches: 258 CVS Repos Size in KB: 4034353 First Revision Date: Wed Mar 13 10:16:01 1996 Last Revision Date: Thu Jun 26 08:39:31 2008 ------------------ Timings (seconds): ------------------ 1997 pass1 CollectRevsPass 18 pass2 CleanMetadataPass 0 pass3 CollateSymbolsPass 370 pass4 FilterSymbolsPass 6 pass5 SortRevisionSummaryPass 5 pass6 SortSymbolSummaryPass 320 pass7 InitializeChangesetsPass 3435 pass8 BreakRevisionChangesetCyclesPass 4088 pass9 RevisionTopologicalSortPass 4089 total 4088 seconds / 3600 seconds/hour... you don't need a calculator to realize that's around one and an eighth hours. And check out pass 8's timing, almost as bad. Oh well, pass 10 is up next... I was going to think about adding --verbose, but on reflection I don't think I want all that. When pass 9 succeeded, my commandline was: ./cvs2svn --svnrepos=./svnroot --fs-type=fsfs --dry-run --no-cross-branch-commits --username=svnconvert --cvs-revnums --use-internal-co --retain-conflicting-attic-files --encoding=ascii --encoding=utf_8 --encoding=utf_16 --encoding=shift_jis --encoding=mac_roman --encoding=latin_1 --encoding=euc_jp --exclude=php4/CREDITS --tempdir=/Volumes/External/private/tmp/cvs2svn-tmp ./realroot ===== Passes 10-16 ===== 10: Pass 10 was what my Warcraft friends would call "easysauce": 11: Well, pass 11 was faster than 8 and 9, if not as fast as 10... 12: Another one bites the dust! 13: Pass 13 certainly was interesting. Generating all the SVN commits... 14: Nice short easy one. 15: 15 down, 1 to go! 16: Pop the champagne cork! ----- pass 10 (BreakSymbolChangesetCyclesPass) ----- Breaking symbol changeset dependency cycles... Done Time for pass10 (BreakSymbolChangesetCyclesPass): 181.9 seconds. ----- pass 11 (BreakAllChangesetCyclesPass) ----- Breaking CVSSymbol dependency loops... Done Time for pass11 (BreakAllChangesetCyclesPass): 1039 seconds. ----- pass 12 (TopologicalSortPass) ----- Generating CVSRevisions in commit order... Done Time for pass12 (TopologicalSortPass): 255.5 seconds. ... Creating Subversion r192256 (commit) Done Time for pass13 (CreateRevsPass): 512.2 seconds. ----- pass 14 (SortSymbolsPass) ----- Sorting symbolic name source revisions... Done Time for pass14 (SortSymbolsPass): 9.787 seconds. ----- pass 15 (IndexSymbolsPass) ----- Determining offsets for all symbolic names... Done. Time for pass15 (IndexSymbolsPass): 6.344 seconds. ----- pass 16 (OutputPass) ----- Starting Subversion r192256 / 192256 Done. Time for pass16 (OutputPass): 706.2 seconds. cvs2svn Statistics: ------------------ Total CVS Files: 159442 Total CVS Revisions: 912073 Total CVS Branches: 160368 Total CVS Tags: 1829773 Total Unique Tags: 1705 Total Unique Branches: 258 CVS Repos Size in KB: 4034353 Total SVN Commits: 192256 First Revision Date: Wed Mar 13 10:16:01 1996 Last Revision Date: Thu Jun 26 08:39:31 2008 ------------------ Timings (seconds): ------------------ 1996.9 pass1 CollectRevsPass 18.1 pass2 CleanMetadataPass 0.3 pass3 CollateSymbolsPass 370.0 pass4 FilterSymbolsPass 6.3 pass5 SortRevisionSummaryPass 5.3 pass6 SortSymbolSummaryPass 320.0 pass7 InitializeChangesetsPass 3435.3 pass8 BreakRevisionChangesetCyclesPass 4088.5 pass9 RevisionTopologicalSortPass 181.9 pass10 BreakSymbolChangesetCyclesPass 1038.6 pass11 BreakAllChangesetCyclesPass 255.5 pass12 TopologicalSortPass 512.2 pass13 CreateRevsPass 9.8 pass14 SortSymbolsPass 6.3 pass15 IndexSymbolsPass 706.2 pass16 OutputPass 706.5 total Now, to do it all at once without --dry-run. It'll take awhile. Wish me luck! When pass 16 succeeded, my commandline was: ./cvs2svn --svnrepos=./svnroot --fs-type=fsfs --dry-run --no-cross-branch-commits --username=svnconvert --cvs-revnums --use-internal-co --retain-conflicting-attic-files --encoding=ascii --encoding=utf_8 --encoding=utf_16 --encoding=shift_jis --encoding=mac_roman --encoding=latin_1 --encoding=euc_jp --exclude=php4/CREDITS --tempdir=/Volumes/External/private/tmp/cvs2svn-tmp ./realroot ===== The real run ===== Well, on the first try of this, I had my temporary files directory on the external drive and the output SVN root on the internal drive. I didn't realize the SVN root was going to be larger than the CVS root, with the result that I woke up with my computer screaming "Your primary system disk is out of space." The result was a computer that took an hour to get back up to speed, thanks to Darwin's VM manager and its extremely poor handling of low-memory situations. No big deal, really. Not the first time I've done something like that. So I tried the run again with the output root on the external drive, which has considerably more free space. A few hours later I came back to my machine to find the same thing had happened! I'd neglected to realize that the VM manager in Darwin doesn't know how to make actual use of the space it has, and cvs2svn had used up every byte of drive space by using up every byte of RAM. So I had to free up some space on the system disk. How often do I exhaust 2GB of swap space with 2GB of physical RAM? Not often! All of this was a lack of recognition of the sheer size of PHP's CVS repository. The resulting SVN repository will have 196,000 commits or more, and take up over 4GB of space, maybe 5. So, I started the run a third time, with significantly more space available for Darwin's poor brain-damaged memory manager and the cvs2svn temporary and output files. Passes 1 through 15 completed in about three hours time overall, finally, but pass 16 was something else again. Keep in mind, pass 16 is the "commit to SVN" phase, and we have 192,256 commits to do. Not only that, but cvs2svn invokes first the RCS "co" command and then the "svnadmin" command for //every single commit//! That means forking some 192256*2+1=384513 processes, on a Darwin system with a maximum of 100,000 PIDs, about 100 of which are already in use by various system processes and applications! That's gonna make my poor OS cycle its PID usage a few times. A good six hours later I was at revision 17088/192256 and realized... this was gonna take a **long, long, long** time. Alright, I thought, let's see what we can't do to make this a little faster. I remembered from the docs the --dumpfile option, which drops all the SVN data into a dumpfile that svnadmin can later load more or less all at once. That certainly would save all the svnadmin invocations, right? Right? Well, I wasn't going to waste six hours worth of commits without some kind of proof that this really was gonna cut the time in half like that, so I dove into the cvs2svn source code a second time. Python is a very whitespace-dependant language... it makes me shudder, frankly. I surfed my way through what little I managed to understand of the code and finally figured out... yes, the --dumpfile option writes directly to an on-disk file instead of invoking svnadmin for every commit! Wow, what the hell is the idea of not making this the default? Sure, the dumpfile is gonna be ridiculously huge, like 10GB, but I can handle that! I canceled the hell out of the current run. But, why have to do passes 1 through 15 over again when it's three hours work that doesn't change in the slightest? That's why I used Control-C to break cvs2svn's run. I added the appropriate "--dumpfile" option, ditched "--svnrepos", and then tossed in "--pass=16" to make it start from the end, as it were. Since I'd C-C'd, it didn't delete the temp files, and I ended up with pass 16 restarting with the dumpfile output. Very nice, right? Sure enough, while the thing was still invoking "co" for every revision, at least the useless svnadmin slop was gone. That instantly doubled the speed of pass 16. Well worth the wasted 6 hours; I only wished I'd thought of it sooner. Six more hours later I'd found another disk space error somewhere around revision 50000. Oops... only 8GB on my external drive wasn't enough for both the temporary files and the output file. Did some rummaging about in my filesystem and found about 40GB of old data to delete that I'd been too lazy to clean up before. Then I started pass 16... again. Whoops. I made the mistake of thinking again. I wondered why in the world the darned thing was invoking "co" so much when I'd told it to use internal code for checkouts. Lo and behold, I checked my commandline and somewhere along the line I'd changed it to --use-rcs because of the disk space problems! Ooooops... no wonder six hours didn't get me all the way through... So I killed the pass... again... and tried to restart it with --use-internal-co. Err... whoops some more. I had to start all 16 passes over to do that! Then again, considering the sheer cost of all that forking, it was probably a save either way, so I started it over. In much less than the three hours, poof, I was at pass 16, and I'd hit revision 60000 within three hours. **That** was better! I went to sleep at that point, and when I woke up, it was done. About time! The statistics were all laid out for me, too: Starting Subversion r192256 / 192256 Done. Time for pass16 (OutputPass): 8455 seconds. cvs2svn Statistics: ------------------ Total CVS Files: 159442 Total CVS Revisions: 912073 Total CVS Branches: 160368 Total CVS Tags: 1829773 Total Unique Tags: 1705 Total Unique Branches: 258 CVS Repos Size in KB: 4034353 Total SVN Commits: 192256 First Revision Date: Wed Mar 13 10:16:01 1996 Last Revision Date: Thu Jun 26 08:39:31 2008 ------------------ Timings (seconds): ------------------ 5360 pass1 CollectRevsPass 21 pass2 CleanMetadataPass 0 pass3 CollateSymbolsPass 444 pass4 FilterSymbolsPass 15 pass5 SortRevisionSummaryPass 9 pass6 SortSymbolSummaryPass 391 pass7 InitializeChangesetsPass 4190 pass8 BreakRevisionChangesetCyclesPass 5254 pass9 RevisionTopologicalSortPass 186 pass10 BreakSymbolChangesetCyclesPass 658 pass11 BreakAllChangesetCyclesPass 254 pass12 TopologicalSortPass 524 pass13 CreateRevsPass 9 pass14 SortSymbolsPass 7 pass15 IndexSymbolsPass 8455 pass16 OutputPass 25778 total Whew! For the lazy, 25778 seconds / 3600 seconds/hour = 7.1606 hours. Not too bad for such a giant CVS repository, all things considered. The resulting dumpfile came out to a whopping 19GB! $ ls -lah /Volumes/External/svnimport.dump -rw-r--r-- 1 gwynne admin 19G Jun 29 07:13 svnimport.dump ===== Importing the dumpfile ===== The next step, import that giant thing into a new svn repository. The commands? "svnadmin create" and "svnadmin load". I knew the latter would take awhile, so I tossed a "time" at the start of it for curiosity's sake. I also threw a chown in there so Darwin wouldn't whine if I later tried to make the repository publically accessible: $ cd /Volumes/External $ svnadmin create --fs-type=fsfs ./phpsvn $ sudo chown -R _svn:_svn ./phpsvn $ sudo time svnadmin load ./phpsvn < ./svnimport.dump While this was running I got to thinking as I watched the output scroll by. I saw a line like this: * adding path : tags/RELEASE_0_5_5/pear/Net_SmartIRC/package.xml ...COPIED... done. See the problem? That really should have been "pear/tags/RELEASE_0_5_5". Something in me knew it wasn't going to be as simple as it'd been so far, but I hadn't thought of this little caveat yet. I checked the cvs2svn FAQ and found the question regarding mutiple projects in a single repository. Oops... I should've been using an options file all along. Well, live and learn... I let the dumpfile keep running, though; no sense //completely// wasting all that work, and at least it served as an example of a successful conversion, if not necessarily a correct one. The final piece of that was: <<< Started new transaction, based on original revision 192256 * editing path : trunk/pecl/apc/apc_cache.c ... done. ------- Committed revision 192256 >>> 30674.07 real 4941.17 user 4724.23 sys $ du -h -d0 ./phpsvn 7.3G ./phpsvn $ 30674.07 seconds is 8.521 hours. So a bit longer than it took to do the original conversion. That figures. Was it wasted effort? I wasn't sure, but it was lessons learned, and that's never a complete waste. ===== A cvs2svn options file ===== Option files in cvs2svn are written in Python. Fortunately, they're also so heavily commented that you don't need to understand the language itself to use them. Being a programmer of a lot of C-ish languages, I was able to at least get a grip on what the code was doing anyway. It didn't matter. The example file was huge, and I mean **huge**! I'll spare you all the neccessity of figuring out the equivelants to the options I'd already specified on the commandline; it was all pretty clear anyway from the file's comments. I added quite a few more little tweaks as I went, though. * I used "ctx.revision_recorder = InternalRevisionRecorder(compress=True)" to get compression for what was previously --use-internal-co. Uses a lot less disk space, and solves I/O binding problems. * I used "ctx.prune = True" to mimic the effects of "cvs update -P" across the entire SVN repository. I saw no reason not to. * I reduced the encodings used for log messages to UTF-8, Latin 1, and ASCII. It seemed simpler. * I used "ctx.symbol_info_filename = '/Volumes/External/phpsvn.syminfo.txt'" to get an output of the decisions made about symbols by cvs2svn. * I used "ctx.username = 'cvs2svn'", since that seemed to make more sense than my previous choice of svnconvert. * I told cvs2svn to use "EOLStyleFromMimeTypeSetter()," for auto-props setting, telling it to use MIME type to figure out EOL styles. * On advice from Jani and Derick, I used "DefaultEOLStyleSetter('native')," to tell it to use native line endings instead of binary style for unknown files. * I set "ctx.cross_project_commits = True" despite my previous commandline choice, based on comments in the example options file which suggested it made more sense to allow them. * I used "changeset_database.use_mmap_for_cvs_item_to_changeset_table = False" for paranoia's sake. A 5% speedup wasn't worth the risk of having to do it multiple times because of my computer exploding with Out of Memory errors. The next step was to take the list of directories in the CVS root and turn it into a list of stanzas similar to: run_options.add_project( r'/Users/gwynne/src/cvs2svn-2.1.1/realroot/TSRM', trunk_path='TSRM/trunk', branches_path='TSRM/branches', tags_path='TSRM/tags', symbol_transforms=[ ReplaceSubstringsSymbolTransform('\\','/'), NormalizePathsSymbolTransform(), ], symbol_strategy_rules=[ ] + global_symbol_strategy_rules, ) Whew. That was going to make for a //very// long options file where it was very easy to make copypasta errors. I needed to add a little Python code. How does one do foreach (array(//blah blah blah//) as $item) { /* etc */ } in Python? So I went to a close friend who **does** know Python. We came up with this rather handy little bit of code: cvsrootdir = r'/Users/gwynne/src/cvs2svn-2.1.1/realroot' fnames = os.listdir(cvsrootdir) for fname in fnames: pathname = os.path.join(cvsrootdir, fname) if os.path.isdir(pathname) and fname != r'CVSROOT': run_options.add_project( pathname, trunk_path=fname + 'trunk', branches_path=fname + 'branches', tags_path=fname + 'tags', symbol_transforms=[ ReplaceSubstringsSymbolTransform('\\','/'), NormalizePathsSymbolTransform(), ], symbol_strategy_rules=[ ] + global_symbol_strategy_rules, ) If we'd gotten this right, the result would be a whole set of projects with every CVS module that wasn't defined in modules. That was another can of worms I intended to open once I got this part right. Well, we'd gotten it mostly right. First attempt came up with this: Traceback (most recent call last): File "./cvs2svn", line 31, in main(sys.argv[0], sys.argv[1:]) File "/Users/gwynne/src/cvs2svn-2.1.1/cvs2svn_lib/main.py", line 47, in main run_options = RunOptions(progname, cmd_args, pass_manager) File "/Users/gwynne/src/cvs2svn-2.1.1/cvs2svn_lib/run_options.py", line 259, in __init__ self.process_options_file(value) File "/Users/gwynne/src/cvs2svn-2.1.1/cvs2svn_lib/run_options.py", line 739, in process_options_file execfile(options_filename, g, l) File "./phpsvn.options", line 110, in fnames = os.listdir(cvsrootdir) NameError: name 'os' is not defined The fix was easy: adding "import os" to the top of the options file. Bingo, cvs2svn ran. This was my phpsvn.options file: # (Be in -*- python -*- mode.) import re import os from cvs2svn_lib.boolean import * from cvs2svn_lib import config from cvs2svn_lib import changeset_database from cvs2svn_lib.common import CVSTextDecoder from cvs2svn_lib.log import Log from cvs2svn_lib.project import Project from cvs2svn_lib.svn_output_option import DumpfileOutputOption from cvs2svn_lib.svn_output_option import ExistingRepositoryOutputOption from cvs2svn_lib.svn_output_option import NewRepositoryOutputOption from cvs2svn_lib.revision_manager import NullRevisionRecorder from cvs2svn_lib.revision_manager import NullRevisionExcluder from cvs2svn_lib.rcs_revision_manager import RCSRevisionReader from cvs2svn_lib.cvs_revision_manager import CVSRevisionReader from cvs2svn_lib.checkout_internal import InternalRevisionRecorder from cvs2svn_lib.checkout_internal import InternalRevisionExcluder from cvs2svn_lib.checkout_internal import InternalRevisionReader from cvs2svn_lib.symbol_strategy import AllBranchRule from cvs2svn_lib.symbol_strategy import AllTagRule from cvs2svn_lib.symbol_strategy import BranchIfCommitsRule from cvs2svn_lib.symbol_strategy import ExcludeRegexpStrategyRule from cvs2svn_lib.symbol_strategy import ForceBranchRegexpStrategyRule from cvs2svn_lib.symbol_strategy import ForceTagRegexpStrategyRule from cvs2svn_lib.symbol_strategy import HeuristicStrategyRule from cvs2svn_lib.symbol_strategy import UnambiguousUsageRule from cvs2svn_lib.symbol_strategy import DefaultBasePathRule from cvs2svn_lib.symbol_strategy import HeuristicPreferredParentRule from cvs2svn_lib.symbol_strategy import SymbolHintsFileRule from cvs2svn_lib.symbol_transform import ReplaceSubstringsSymbolTransform from cvs2svn_lib.symbol_transform import RegexpSymbolTransform from cvs2svn_lib.symbol_transform import NormalizePathsSymbolTransform from cvs2svn_lib.property_setters import AutoPropsPropertySetter from cvs2svn_lib.property_setters import CVSBinaryFileDefaultMimeTypeSetter from cvs2svn_lib.property_setters import CVSBinaryFileEOLStyleSetter from cvs2svn_lib.property_setters import CVSRevisionNumberSetter from cvs2svn_lib.property_setters import DefaultEOLStyleSetter from cvs2svn_lib.property_setters import EOLStyleFromMimeTypeSetter from cvs2svn_lib.property_setters import ExecutablePropertySetter from cvs2svn_lib.property_setters import KeywordsPropertySetter from cvs2svn_lib.property_setters import MimeMapper from cvs2svn_lib.property_setters import SVNBinaryFileKeywordsPropertySetter Log().log_level = Log.VERBOSE ctx.output_option = DumpfileOutputOption( dumpfile_path=r'/Volumes/External/phpsvn.dumpfile', ) ctx.dry_run = False ctx.revision_recorder = InternalRevisionRecorder(compress=True) ctx.revision_excluder = InternalRevisionExcluder() ctx.revision_reader = InternalRevisionReader(compress=True) ctx.svnadmin_executable = r'svnadmin' ctx.sort_executable = r'sort' ctx.trunk_only = False ctx.prune = True ctx.cvs_author_decoder = CVSTextDecoder( [ 'latin1', 'utf8', 'ascii', ], #fallback_encoding='ascii' ) ctx.cvs_log_decoder = CVSTextDecoder( [ 'latin1', 'utf8', 'ascii', ], #fallback_encoding='ascii' ) ctx.cvs_filename_decoder = CVSTextDecoder( [ 'latin1', 'utf8', 'ascii', ], #fallback_encoding='ascii' ) ctx.decode_apple_single = False ctx.symbol_info_filename = '/Volumes/External/phpsvn.syminfo.txt' global_symbol_strategy_rules = [ ExcludeRegexpStrategyRule(r'php4/CREDITS'), UnambiguousUsageRule(), BranchIfCommitsRule(), HeuristicStrategyRule(), DefaultBasePathRule(), HeuristicPreferredParentRule(), ] ctx.username = 'cvs2svn' ctx.svn_property_setters.extend([ CVSBinaryFileEOLStyleSetter(), CVSBinaryFileDefaultMimeTypeSetter(), EOLStyleFromMimeTypeSetter(), DefaultEOLStyleSetter('native'), SVNBinaryFileKeywordsPropertySetter(), KeywordsPropertySetter(config.SVN_KEYWORDS_VALUE), ExecutablePropertySetter(), CVSRevisionNumberSetter(), ]) ctx.tmpdir = r'/Volumes/External/private/tmp/cvs2svn-tmp' ctx.cross_project_commits = True ctx.cross_branch_commits = True ctx.retain_conflicting_attic_files = True run_options.profiling = False changeset_database.use_mmap_for_cvs_item_to_changeset_table = False cvsrootdir = r'/Users/gwynne/src/cvs2svn-2.1.1/realroot' fnames = os.listdir(cvsrootdir) for fname in fnames: pathname = os.path.join(cvsrootdir, fname) if os.path.isdir(pathname) and fname != r'CVSROOT': run_options.add_project( pathname, trunk_path=fname + 'trunk', branches_path=fname + 'branches', tags_path=fname + 'tags', symbol_transforms=[ ReplaceSubstringsSymbolTransform('\\','/'), NormalizePathsSymbolTransform(), ], symbol_strategy_rules=[ ] + global_symbol_strategy_rules, ) I was a little bit more careful with my commandline this time; I wanted some log of what was going on while still seeing the progress. So I fell back on the trusty "tee" command: $ ./cvs2svn --option=./phpsvn.options | tee ./phpsvn.convert.out Then it was time to watch what happened. Oh boy... Yep. I knew it wouldn't be that simple: Pass 1 complete. =========================================================================== Error summary: ERROR: No RCS files found under '/Users/gwynne/src/cvs2svn-2.1.1/realroot/livingtags'! Are you absolutely certain you are pointing cvs2svn at a CVS repository? ERROR: No RCS files found under '/Users/gwynne/src/cvs2svn-2.1.1/realroot/pear-manual'! Are you absolutely certain you are pointing cvs2svn at a CVS repository? ERROR: No RCS files found under '/Users/gwynne/src/cvs2svn-2.1.1/realroot/phpdoc-ar-only'! Are you absolutely certain you are pointing cvs2svn at a CVS repository? ERROR: No RCS files found under '/Users/gwynne/src/cvs2svn-2.1.1/realroot/phpdoc-he-only'! Are you absolutely certain you are pointing cvs2svn at a CVS repository? ERROR: No RCS files found under '/Users/gwynne/src/cvs2svn-2.1.1/realroot/phpdoc-ro-dir'! Are you absolutely certain you are pointing cvs2svn at a CVS repository? ERROR: No RCS files found under '/Users/gwynne/src/cvs2svn-2.1.1/realroot/phpdoc-ro-only'! Are you absolutely certain you are pointing cvs2svn at a CVS repository? Exited due to fatal error(s). Well... every single one of those folders was utterly empty. Oh well. I **thought** it was kinda suspicious that there were directories named after what I //knew// were pseudo-modules. Bah. I killed the directories and tried again. Clean sweep this time around: cvs2svn Statistics: ------------------ Total CVS Files: 159415 Total CVS Revisions: 909522 Total CVS Branches: 154874 Total CVS Tags: 1835211 Total Unique Tags: 3495 Total Unique Branches: 489 CVS Repos Size in KB: 4032117 Total SVN Commits: 189058 First Revision Date: Wed Mar 13 10:16:01 1996 Last Revision Date: Thu Jun 26 08:39:31 2008 ------------------ Timings (seconds): ------------------ 1815 pass1 CollectRevsPass 15 pass2 CleanMetadataPass 1 pass3 CollateSymbolsPass 741 pass4 FilterSymbolsPass 38 pass5 SortRevisionSummaryPass 14 pass6 SortSymbolSummaryPass 364 pass7 InitializeChangesetsPass 4020 pass8 BreakRevisionChangesetCyclesPass 4226 pass9 RevisionTopologicalSortPass 175 pass10 BreakSymbolChangesetCyclesPass 373 pass11 BreakAllChangesetCyclesPass 256 pass12 TopologicalSortPass 592 pass13 CreateRevsPass 10 pass14 SortSymbolsPass 9 pass15 IndexSymbolsPass 13529 pass16 OutputPass 26179 total 26179 seconds = 7.272 hours. Not bad, just slightly over the time for the one-project run, most of it in OutputPass. The astute will notice most of the saved time is in BreakRevisionChangesetCyclesPass and RevisionTopologicalSortPass. That makes a lot of sense, since cvs2svn no longer had to break nonsensical dependencies between projects that weren't actually related. $ ls -lah /Volumes/External/phpsvn.* -rw-r--r-- 1 gwynne admin 19G Jun 30 19:05 phpsvn.dumpfile -rw-r--r-- 1 gwynne admin 1.9M Jun 30 12:19 phpsvn.syminfo.txt $ ls -lah /Users/gwynne/src/cvs2svn-2.1.1/phpsvn.convert.out -rw-r--r-- 1 gwynne staff 189M Jun 30 19:05 phpsvn.convert.out 189MB just for the logfile. Crazy. Oh well. Time to import the dumpfile. $ svnadmin create --fs-type=fsfs /Volumes/External/phpsvn $ sudo chown -R _svn:_svn /Volumes/External/phpsvn $ sudo time svnadmin load /Volumes/External/phpsvn < /Volumes/External/phpsvn.dumpfile | tee ./phpsvn.load.out And the result: ------- Committed revision 189058 >>> 44243.87 real 5250.85 user 5715.10 sys 44243.87 = 12.29 hours. That could be because my CPU was under heavy load from other things, though, and I expect that's the exact reason. Oh well. Now we have a half-working repository. ===== Modules and externals ===== Well, cvs2svn doesn't handle translating CVSROOT/modules into externals definitions, so now I have to set those up manually. This becomes a bit more complex, because Subversion's externals support is not a drop-in replacement for modules. SVN 1.5 adds sparse checkouts, which ease the burden a little, but not enough. The topic requires discussion on methods of implementation. Please see the RFC I created at [[rfc:svnexternals]]. ===== Checking out the repository ===== After a few days I got tired of waiting for people to comment. I suppose it was reasonable for them to think a proof of concept conversion didn't need mass discussion. For the moment I decided I'd go with merging the ZendEngine2 and Zend modules. Even in CVS it's questionable why the modules were split; in SVN it's completely ridiculous. Would I be able to just drop ZendEngine2's trunk on top of Zend and go *pfft* on ZendEngine2? Oops, no; the histories of the two modules are woven together in very strange ways. They have a lot of version tags in common and a lot not in common. Why, when ZendEngine2 wasn't even in use for PHP 4? Oh well. Out of sight, out of mind. I sure as hell wasn't expecting anyone to try to convert an existing working copy from CVS to SVN after a real repository change, so it was okay to do things that would utterly trash modifications made in local copies, including changing around directory structure. The question becomes one of updating scripts that depend (foolishly) on these directory names. The first step was to get a checkout to play with: $ svn checkout svn://phpsvn.gwynne.dyndns.org/ phpsvn-co Yep. //Every single module//, with every tag and every branch. I went for a drink. This was gonna take awhile... Of course it wasn't going to be that simple. SVN belched this one out at me: svn: Failed to add directory 'phpsvn-co/pear/branches/start/Selenium/tests': an unversioned directory of the same name already exists At least it was fairly obvious what that meant, at least to me. There was a .svn directory checked into the repository. I confirmed with an svn ls: $ svn ls svn://phpsvn.gwynne.dyndns.org/pear/branches/start/Selenium/tests .svn/ SeleniumTest.php events/ html/ $ Someone checked a SVN working copy into CVS. That was a pretty strange thing to do, but it's a showstopper when switching to SVN! Fortunately, this wasn't CVS, and I was able to kill the offending directories with one command (I had to use file URLs because svnserve's configured not to allow any write access): $ sudo svn rm -m "[SVN CONVERSION] Removing .svn directories that break SVN checkout." \ file:///Volumes/External/phpsvn/pear/branches/start/Selenium/tests/.svn \ file:///Volumes/External/phpsvn/pear/branches/start/Selenium/tests/events/.svn \ file:///Volumes/External/phpsvn/pear/branches/start/Selenium/tests/html/.svn \ file:///Volumes/External/phpsvn/pear/branches/start/Selenium/docs/.svn \ file:///Volumes/External/phpsvn/pear/branches/start/Selenium/examples/.svn \ file:///Volumes/External/phpsvn/pear/branches/start/Selenium/.svn Committed revision 189059. $ I then re-ran the checkout command, knowing SVN would resume from where it left off now that it was able. Or not. It spit the same error back at me again. Rather than assume I'd failed to fix the problem with a logical solution, I thought maybe SVN was trying to check out from the r189058 it had started with still. That made a lot of sense. Easy solution, rm -Rf phpsvn-co and start over. Slower, but a good guarantee that people who try to do the same don't get messed up anyway. It couldn't be that simple. There were more .svn directories floating about, in pear/branches/start/Testing_Selenium. I got rid of those the same way. And the next error was nasty... svn: Failed to add directory 'phpsvn-co/pear/branches/Townnews': a versioned directory of the same name already exists What? Huh? Why? I ran a ls to find out... $ svn ls svn://phpsvn.gwynne.dyndns.org/pear/branches ... TOWNNEWS/ Townnews/ ... $ Case-insensitive filesystem made that impossible to use in a checkout. Ohhhhh boy... It was about this time I looked at the rest of the branches directory in pear and realized something very annoying... Two more lines from that list: Tree_0_3_0/ XML_TRANSFORMER_1_1/ That means every PEAR subdirectory needs its own branches/tags/trunk subdirectory. All the way back to cvs2svn and our options file, then. No point continuing to work with an incorrect repository. Were there any other top-level modules like that? Yep, pecl. I dove into cvs2svn's options file and tweaked it. It took awhile to figure out what I was doing in Python, but I finally came up with this code fragment: def recurse_dir(rootdir, modprefix, exceptions, deepens, run_options, xforms, rules): global os, recurse_dir fnames = os.listdir(rootdir) for fname in fnames: pathname = os.path.join(rootdir, fname) if os.path.isdir(pathname) and fname not in exceptions and fname not in deepens: run_options.add_project( pathname, trunk_path=modprefix + fname + '/trunk', branches_path=modprefix + fname + '/branches', tags_path=modprefix + fname + '/tags', symbol_transforms=xforms, symbol_strategy_rules=[] + rules, ) elif os.path.isdir(pathname) and fname in deepens: recurse_dir(os.path.join(rootdir, fname), fname + '/', [], [], run_options, xforms, rules) recurse_dir(r'/Users/gwynne/src/cvs2svn-2.1.1/realroot', '', ['CVSROOT'], ['pear', 'pecl'], run_options, [ReplaceSubstringsSymbolTransform('\\','/'), NormalizePathsSymbolTransform()], global_symbol_strategy_rules) And started the cvs2svn run over. As one can see from the code above, it was a real mess dealing with the way cvs2svn calls option files, but I finally managed to fudge it to kill all the NameErrors. I knew it wouldn't be that simple, of course. There were some dead directories in both pear and pecl to rm out of the CVS root, but that was easy enough. For the sake of record, they were: pear/dazuko pear/HTML_QuickForm_ComboBox pear/Net_UserAgent_Mobile_GPS pear/PEAR_Forum pear/PHP_Fork pear/Services_Compete pear/Stream_Callback pear/Text_CAPTCHA pear/XML_HTMLSax3 pecl/cairo_wrapper pecl/ircg pecl/libextractor pecl/pdo_db2 pecl/postparser pecl/tar I'll spare you all the pain I went through figuring out some more options for efficiency's sake, since none of it is useful info. Suffice it to say I ended up also changing these options: ctx.output_option = NewRepositoryOutputOption( r'/Volumes/External/phpsvn', fs_type='fsfs', ) ctx.cross_project_commits = False ctx.cross_branch_commits = False Finally, after this set of time statistics, I was ready to move on: cvs2svn Statistics: ------------------ Total CVS Files: 159414 Total CVS Revisions: 909490 Total CVS Branches: 152400 Total CVS Tags: 1837685 Total Unique Tags: 11525 Total Unique Branches: 1012 CVS Repos Size in KB: 4032089 Total SVN Commits: 271574 First Revision Date: Wed Mar 13 10:16:01 1996 Last Revision Date: Thu Jun 26 08:39:31 2008 ------------------ Timings (seconds): ------------------ 2478 pass1 CollectRevsPass 26 pass2 CleanMetadataPass 4 pass3 CollateSymbolsPass 553 pass4 FilterSymbolsPass 10 pass5 SortRevisionSummaryPass 8 pass6 SortSymbolSummaryPass 335 pass7 InitializeChangesetsPass 8325 pass8 BreakRevisionChangesetCyclesPass 8753 pass9 RevisionTopologicalSortPass 307 pass10 BreakSymbolChangesetCyclesPass 524 pass11 BreakAllChangesetCyclesPass 280 pass12 TopologicalSortPass 593 pass13 CreateRevsPass 10 pass14 SortSymbolsPass 11 pass15 IndexSymbolsPass 34894 pass16 OutputPass 57114 total That's almost 16 hours, for those keeping score. Anyway, the next step was to find a case-sensitive filesystem to check out to. Easy! Create a 20GB blank sparse disk image, format it HFS+ journaled **case-sensitive** and checkout to there. Well, that didn't work very well. A full checkout of just php-src with all its tags and branches is well past 20G in HFS+. Forget the entire repository. Some of the tags in there are completely ridiculous, and the branching, the naming of the tags is just awful... but I digress. I decided the most obvious thing to do was to work with smaller pieces of the repository. So I picked up a checkout of ZendEngine2 and Zend. ===== Moving to svn.php.net ===== It was time to work on a system with slightly more capabilities than mine; I logged into cvs.php.net (also svn.php.net) and started work there. I modified the cvs2svn options file accordingly, set up a blank SVN repository next to the CVS repository, took a snapshot of the CVS repository, and ran cvs2svn over the snapshot. I didn't run into any unexpected issues, which was a pleasant surprise. Next step was to check out each module in the SVN repository to find any problems such as that mentioned above with Selenium. A long and annoying process, but at least it's easy. ===== Doing some checkouts ===== Well, what's a cheap way to check out all the SVN modules and see whether there are problems, without overruning the limited hard drive space of my system? Answer: Shell script! I came up with this little gem: #!/bin/bash dirs=`ls $1` for dir in $dirs; do echo "Processing ${dir}..." if [ -d "$1/${dir}" ]; then svn co $2/"${dir}" >> ./checkout.log 2>&1 if [ "$?" -eq 0 ]; then echo "Successful on ${dir}." >> ./checkout.results else echo "FAILED ON ${dir}!" >> ./checkout.results fi rm -Rf ./"${dir}" fi done Worked like a charm. Pointed it at the CVS and SVN repositories, and kicked it into gear. A couple hours of tail -f checkout.log scrolling later, I had the following list of failures: FAILED ON CVSROOT! FAILED ON livingtags! FAILED ON pear! FAILED ON pear-manual! FAILED ON phpdoc-ar-only! FAILED ON phpdoc-he-only! FAILED ON phpdoc-ro-dir! FAILED ON phpdoc-ro-only! FAILED ON phpdoc-tr-dir! FAILED ON zend! Every single one of those //except// pear was a nonexistent module, empty in CVS and ignored entirely in the SVN conversion. That left the pear module. Sure enough, the expected failure in Selenium and Testing_Selenium from someone who checked in .svn directories to CVS for some unknown reason. They were easily removed with a direct svn rm command: $ sudo -u svn \ svn rm -m "[SVN CONVERSION] Removing .svn directories that break SVN checkout." \ $SVNROOT/pear/Selenium/branches/shin/.svn \ $SVNROOT/pear/Selenium/branches/shin/tests/.svn \ $SVNROOT/pear/Selenium/branches/shin/tests/events/.svn \ $SVNROOT/pear/Selenium/branches/shin/tests/html/.svn \ $SVNROOT/pear/Selenium/branches/shin/docs/.svn \ $SVNROOT/pear/Selenium/branches/shin/examples/.svn \ $SVNROOT/pear/Selenium/tags/start/tests/.svn \ $SVNROOT/pear/Selenium/tags/start/tests/events/.svn \ $SVNROOT/pear/Selenium/tags/start/tests/html/.svn \ $SVNROOT/pear/Selenium/tags/start/docs/.svn \ $SVNROOT/pear/Selenium/tags/start/examples/.svn \ $SVNROOT/pear/Selenium/tags/start/.svn \ $SVNROOT/pear/Testing_Selenium/branches/shin/.svn \ $SVNROOT/pear/Testing_Selenium/branches/shin/tests/.svn \ $SVNROOT/pear/Testing_Selenium/branches/shin/tests/events/.svn \ $SVNROOT/pear/Testing_Selenium/branches/shin/tests/html/.svn \ $SVNROOT/pear/Testing_Selenium/branches/shin/docs/.svn \ $SVNROOT/pear/Testing_Selenium/branches/shin/examples/.svn \ $SVNROOT/pear/Testing_Selenium/tags/start/.svn \ $SVNROOT/pear/Testing_Selenium/tags/start/tests/.svn \ $SVNROOT/pear/Testing_Selenium/tags/start/tests/events/.svn \ $SVNROOT/pear/Testing_Selenium/tags/start/tests/html/.svn \ $SVNROOT/pear/Testing_Selenium/tags/start/docs/.svn \ $SVNROOT/pear/Testing_Selenium/tags/start/examples/.svn Committed revision 279477. $ ===== Meta-SVN! ===== About this time I realized that a lot of things related to SVN would require version control //before// the repository was ready for use! Things like all the various scripts involved in the conversion itself, all the authorization data, the commit hooks, all the fun stuff. Putting these things into CVS would result in a bit of recursive failure. Putting them into the SVN repository I'd set up would interfere with the conversion, and besides, this was metadata, stuff that belongs in an equivelant to CVSROOT. Solution: A second SVN repository under much more restricted authorization control. I put in a request for a metasvn.php.net domain name and set up cvs.php.net's Apache to serve it from a separate repository. Then Wez and a couple others convinced me that was a stupid idea. There wasn't //really// any reason this stuff couldn't go into CVS, other than my ornery resistance to the older and less useful system. It was about this time that I had to study Git for another project and began to wonder if maybe it wasn't better than SVN, but I'm just not into the idea of learning an entirely new system and forcing everyone else to do the same. SVN maps 90% onto CVS commands... Git maps more like 40%. SVN is a good midway step to true distributed VCS, and there are plenty of Git/SVN interface tools. So I set up a CVS module called SVNROOT/, got karma to it, and checked in my options file along with the checkout script above. Almost immediately I got an interesting question: "Didn't we decide to use PHP instead of Python?" Yes, we did. And yes, the options file is written in Python. Unfortunately, the way cvs2svn is set up makes this necessary; it includes the options file similarly to a PHP include directive. ===== Reorganization ==== Next step: Decide on a repository structure. Ooops... lots of differing opinions on that. Well, this was getting complicated. It was time to step back and automate some of the process. So I popped open a new PHP file and came up with automation for the svn create, cvs2svn, and svn rm commands already discussed. Then I went back and added some nice command-line-y-ness to it using PEAR's Console_CommandLine (a VERY nice package, kudos to its author(s)!). The script can be viewed at [[http://cvs.php.net/viewvc.cgi/SVNROOT/run-conversion.php?view=log]]. That done, I looked back at the reorganization mess. It looked like there would in fact be a few separate repositories for things like PEAR and GTK. I needed advice on this one, so I went to the mailing list. They wanted to know, "why separate repositories?" Well, it's a matter of maitenance, really. GTK, PEAR, Zend, they all have their own little quirks in the hook scripts and really it's just simpler and more elegant for them to have their own workspaces to play in rather than all this endless special-casing in the hooks and ACLs. So I rewrote the conversion script completely to support this premise, and contacted various people to find out what to do with the "miscellaneous" modules scattered all over the place. Turned out most of them either belonged alongside php-src or were just plain defunct! The choice was made not to convert defunct modules, since there is a plan to leave the CVS repository available in some form. ===== Hook scripts ===== At a glance it might seem that would be the end of it. But unfortunately, no. There are a lot of administrative tasks done by scripts in CVSROOT, all of which need to be ported to SVN equivelants. I decided it would be astute to make a list of what needed to be ported before actually getting into it! To do that, I grabbed a copy of CVSROOT itself and had a looksee. It turned out the following things needed conversion: * Access Control Lists - replaced by the SVN authz database * commitinfo.pl - I couldn't quite figure out what this was for. It seemed to write the name of the committed directory to a file. A little more investigation showed it to be part of the loginfo.pl automation * cvswrappers - Replaced by SVN's autoprops * loginfo.pl - Sends the e-mails to various mailing lists when commits happen * modules - Replaced by svn:externals and restructuring * readers - Replaced by SVN's authz database ===== Available for the curious ===== Meanwhile, the converted PHP repository is now available via: $ svn co http://svn.php.net This will check out all the projects in the repository; it's suggested to specify a particular module like [[http://svn.php.net/php-src/trunk]]. Don't forget about svn ls!