Converting CVS and SVN repositories to Git

I had the wonderful opportunity of converting several CVS and SVN repositories to Git, this is of course a very enjoyable process that is both quick and easy especially when the repositories are non-standard and use a few different code pages 🙃

CVS

I initially tried using git-cvsimport which although easy to set up it was incredibly slow and couldn’t handle the various text encodings.

I wound up using cvs2git along with git fast-import this was much faster and cvs2git allows you to stack the potential text encodings used which it will try in order so you can prioritize them.

SVN

I first tried using git-svn with help from this svn-migration-scripts.jar made by Atlassian to get a list of the authors in the SVN repo. This worked ok with standard SVN repositories, but any that were weird did not play well. Which of course a number of the SVN repositories were non-standard.

Instead, I used svn-all-fast-export to do the actual svn to git migration and used svneverever to figure out the structure of the SVN repository to make the rules for svn-all-fast-export. I still used svn-migration-scripts.jar to create the identity-map for svn-all-fast-export. This gave much better results but was much more tedious.

To make the rules for svn-all-fast-export I used svneverever to generate a few different outputs of the structure of the SVN repository to make sure I had all the information I needed.

svneverever example using –depth 1 on a non-standard SVN repo

(  1; 785)  /branches
                [..]
(510; 622)  /random1
                [..]
(511; 583)  /random2
                [..]
(548; 785)  /random_branches
                [..]
(555; 785)  /random_tags
                [..]
(508; 785)  /random_trunk
                [..]
(  1; 785)  /tags
                [..]
(  1; 785)  /trunk
                [..]

Example rules for svn-all-fast-export

create repository git_repo_name
end repository

match /trunk/
  repository git_repo_name
  branch master
end match

match /(random1|random2)/
  repository git_repo_name
  branch \1
end match

match /branches/([^/]+)/
  repository git_repo_name
  branch \1
end match

match /tags/([^/]+)/
  repository git_repo_name
  branch tags/\1
end match

match /random_trunk/
  repository git_repo_name
  branch random
end match

match /random_branches/([^/]+)/
  repository git_repo_name
  branch \1
end match

match /random_tags/([^/]+)/
  repository git_repo_name
  branch tags/\1
end match

match /([^/]+)$
  repository git_repo_name
  branch master
  prefix \1
end match

One issue I ran into was when there was no author for a given SVN commit svn-all-fast-export would throw an error and quit, a workaround for this is to add (no author) = no_author <no_author@no_author> to the identity-map for svn-all-fast-export. Also when dealing with character encoding Linux is your friend, trying to handle character encoding issues in Windows is much more difficult if not effectively impossible.