< Project:Infrastructure
Project:Infrastructure/Git migration
Status
Final hosting is ready. Launch planning for weekend of August 8/9.
Blockers
- Infra Manpower
- Ensure availability of final history conversion host
- Needs lots of RAM, parallel CPU and some SSD backing
- Consider a 1-month Hetzner server bidding option
- Consider a RackSpace OnMetal I/O node (if available by the hour/day)
- Consider a large AWS instance by hour
- r3.2xlarge, m4.4xlarge, c4.8xlarge; maybe even larger?
Launch plan
Steps
Top-level items in bold are considered critical path to service migration.
- Freeze
- No more CVS commits to
gentoo-x86
ever again - CVS->rsync conversion frozen
- No more CVS commits to
- Take backups
- Final tree snapshot
- Final CVS history backup
- Publish both
- Perform cleanups on final snapshot
- Remove ChangeLog files
- Convert to thin manifests
- Publish cleaned snapshot as reference
- Commit fixed snapshot as initial signed commit on new history
- Allow developers to clone new repo and commit to it
- Turn on git->rsync
- Manifests: Converts thin->thick
- Changelogs: (temporary) we explicitly copy the changelog-as-is from the final
- Review/fix all scripts for further breakages
- Perform history conversion
- Re-introduce cleanups in history
- The state of (history conversion + cleanups) MUST match the state of (initial commit) at this point
- Make converted history available as graft point
- Adjust git->sync
- Re-enable true ChangeLog generation
- (maybe) Implement ChangeLog expiry mechanisms
Tentative date and times
Date and time | Event |
---|---|
2015/08/08 15:00 UTC | Freeze |
2015/08/08 19:00 UTC | Git commits open for developers |
2015/08/09 01:00 UTC | Rsync live again (with delayed changelogs) |
2015/08/11 | History repo available to graft |
2015/08/12 | rsync mirrors carry up-to-date changelogs again |
Resources
- SMW::offRichard Freeman (rich0)SMW::on's validation code: https://github.com/rich0/gitvalidate
- ferringb's generation code: git://pkgcore.org/git-conversion-tools
People
This is in a roughly chronological order, and apologies to anybody that was left out.
- Alec Warner (antarus) - did the GSoC 2006 migration tests
- Robin H. Johnson (robbat2) - infra guy, herding this project
- Nguyen Thai Ngoc Duy (pclouds) - Former Gentoo developer, wrote Git features for the migration
- Michael Haggerty - upstream cvs2svn author
- Brian Harring (ferringb) - wrote much python to improve cvs2svn
- Michael G. Schwern - Perl hacker, fixed git-svn for SVN 1.7 support
- Rich Freeman (rich0) - validation scripts
- Patrick Lauer (patrick) - Gentoo dev, running new 2014 work in migration
Contact
For Git migration discussions subscribe to gentoo-scm mailing list: gentoo-scm@lists.gentoo.org
Conversion process
Goals
- Each Git commit should be mapped to one or more CVS commits
- Portage two-phase commits (commit 1: ebuilds/files/Manifest, commit 2: Manifest regenerated from $Header$ changes, optionally GPG-signed) should be mapped to a single commit
- Portage trailer data in CVS commit log should be converted to newline format Git logs
- As the validation settles, it should become possible to have CVS commits generate known Git commit IDs
- Start list of validated commit IDs
Pseudocode
do { do { adjust conversion scripts do test conversion validated all newly converted commits } while (not validation passed on all commits) switch CVS to read only do final conversion final validation if(final validation passed) { activate Git repo for public commits lock CVS permanently } else { unlock CVS } } while(still using CVS)
Historical migration
Here is how to generate the historical migration in git:
- Patch cvs2svn to use "/" as the separator in the date format in keywords. http://dev.gentoo.org/~rich0/gitmig/cvs2svn.patch
- Use the migration scripts at: https://github.com/gentoo/git-migration-scripts-rich0
- (provide list of dependencies for scripts)
- Obtain tarball of cvsroot (or squashfs - preferable for cache use)
- Place/mount cvs in cvs-repo
- Run script.sh --fast
- From git directory, run git bundle create <destpath> master
Validation
Quick notes on how to test:
- Source for the validation scripts at: https://github.com/rich0/gitvalidate.git
- Clone the git bundle into a directory
- Extract the cvs root into a directory
- (uncertain - may need to set up local bind mounts or symlinks to match the path in the cvs keywords)
- Checkout the cvs gentoo-x86 module into another directory
- (uncertain - may need to edit config files to ensure that cvs checkouts hit the local root, and don't hit Gentoo infra - test before running the script, or watch the script and if it isn't using near 100% CPU it probably is hammering the server so stop it!)
- Use git log to obtain the hash of the last git commit
- Point TMPDIR at a location with ~10GB of space (/tmp on tmpfs may not cut it and sort will fail).
- Run gitdump/gitprocesstree.sh <path to git tree root> <head commit hash> > g
- Run cvsdump/cvsprocesstree.sh <path to gentoo-x86 in cvs root> <path to checkout of gentoo-x86>. > c
- Create a table in mysql to hold the cvs output:
CREATE TABLE `cvs` ( `key` int(11) NOT NULL AUTO_INCREMENT, `filename` varchar(500) COLLATE utf8_bin NOT NULL, `type` varchar(5) COLLATE utf8_bin NOT NULL, `hash` varchar(50) COLLATE utf8_bin NOT NULL, `timestamp` int(11) NOT NULL, `author` varchar(200) COLLATE utf8_bin NOT NULL, `message` text COLLATE utf8_bin NOT NULL, `revision` varchar(10) COLLATE utf8_bin NOT NULL, PRIMARY KEY (`key`), KEY `filename` (`filename`(255),`hash`), KEY `hash` (`hash`) ) ENGINE=MyISAM AUTO_INCREMENT=3132434 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
- Create a table in mysql to hold the git output:
CREATE TABLE `git` ( `key` int(11) NOT NULL AUTO_INCREMENT, `filename` varchar(500) COLLATE utf8_bin NOT NULL, `type` varchar(5) COLLATE utf8_bin NOT NULL, `hash` varchar(50) COLLATE utf8_bin NOT NULL, `timestamp` int(11) NOT NULL, `author` varchar(200) COLLATE utf8_bin NOT NULL, `message` text COLLATE utf8_bin NOT NULL, `commit` varchar(50) COLLATE utf8_bin NOT NULL, PRIMARY KEY (`key`), KEY `filename` (`filename`(255),`hash`), KEY `hash` (`hash`) ) ENGINE=MyISAM AUTO_INCREMENT=3030211 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
- Define the base64 handling procedures found at http://stackoverflow.com/questions/358500/base64-encode-in-mysql
- Load the data into the tables:
load data local infile 'c' into table cvs fields terminated by ',' lines terminated by '\n' (filename,type,hash,timestamp,author,message,revision); load data local infile 'g' into table git fields terminated by ',' lines terminated by '\n' (filename,type,hash,timestamp,author,message,commit);
- Process the data into several tables:
create table onlycvs ENGINE = MYISAM select cvs.* from `cvs` left join `git` as g on cvs.hash=g.hash where g.hash is null ; create table onlygit ENGINE = MYISAM select g.* from `git` as g left join `cvs`on cvs.hash=g.hash where cvs.hash is null ; delete from onlycvs where revision="1.1.1.1" ; delete from onlycvs where filename like "%Manifest%" ; delete from onlygit where filename like "%Manifest%" ; create table baddate ENGINE = MYISAM select c.*,g.commit from `cvs` as c join `git` as g on (g.hash=c.hash and g.filename=c.filename) where abs(c.timestamp - g.timestamp) > 60*60 ; create table badmessage ENGINE = MYISAM select c.*, g.author as gauthor, g.commit, g.message as gmessage from `cvs` as c join `git` as g on (g.hash=c.hash and g.filename=c.filename) where c.message <> g.message and g.filename not like "%Manifest%" and abs(c.timestamp - g.timestamp) < 60*60; UPDATE `badmessage` SET `author`=BASE64_DECODE(`author`), `gauthor`=BASE64_DECODE(`gauthor`), `message`=BASE64_DECODE(`message`), `gmessage`=BASE64_DECODE(`gmessage`);
History
2006
- The first major work in VCS Migration was done as a Project:Infrastructure/Git_migration/GSoC2006 GSoC 2006 project by User:Antarus.
- Git was mostly too resource intensive at this point for serious consideration, and was slower than CVS.
- Conversion takes more than 7 days.
- Decision to stay on CVS
2007
2008
2009
- April:
- Converting a recent CVS copy - Item 1: mailmap fun
- Converting a recent CVS copy - Item 2: statistics
- Conversion time: 18.5 hours
- June:
- Progress summary, 2009/06/01
- Conversion time: 9 hours
- Bug in cvs2svn/cvs2git causes lines of files to be lost
- ExternalBlobGenerator module created by upstream author, originally closed source, and non-public: improves pass1 from 36204 seconds to 1598 seconds
- October: Gentoo meeting at the GSoC Mentor Summit
- All Gentoo developers present held a meeting, one of the major topics was blockers and plans for the Git migration.
- Shawn Pearce, one of the major Git developers, and author of the Repo tool.
- Decision of a monolith repo, per-category repo, per-package repos: monolith repo wins.
2010
- User:ferringb takes on Python improvements with snakeoil and Unladen Swallow
- Gentoo SCM conversion status report, 2010/01/27
- Conversion time: 110 minutes
- Commit Signing & Sparse Trees identified as requirements
2011
- August:
- Re: gentoo-dev Progress on cvs->git migration (status report)
- Unresolved items: commit signing, thin Manifests, merge policies
- September:
- Portage gets thin Manifest support
- October:
2012
- May-July:
- Bug #418431: (git-svn is broken with SVN 1.7 and can corrupt data) causes a hassle for Git work (part of the migration process at this time relies heavily on the cvs2svn codebase)
- October:
- Email [gentoo-scm] Fwd: [gentoo-dev] CIA replacement on 2012/10/01 by rich0.
- Bug #333531: portage migration to git (tracker bug)
- Outstanding items: pre-upload hook, git2rsync scripts, validation, documentation
- Email [gentoo-scm] CVS -> git, list of where non-infra folk can contribute on 2012/10/01 by ferringb
- Lays out the many tasks well
- http://git.stuge.se/?p=portage.git;a=commitdiff;h=thickandthin mentioned for merging, still not done?
- Email [gentoo-scm] Fwd: [gentoo-dev] CIA replacement on 2012/10/01 by rich0.
2013
2014
- February: Progress made on some blockers (i.e. they were found obsoleted)
- Bug #333531: portage migration to git (tracker bug)
- Major outstanding items:
Wait for jk/pack-bitmap to land in a git release (pack-bitmap landed in git 2 release)- Enforce GPG commit signing
Get gitolite to log to syslog
- March: GLEP 63 - Minimum requirement and a recommended set of GPG key management policies for the Gentoo Linux distribution.
- May: Gentoo Keys: Tool that manages GPG key validation/updates and performs multiple "health" checks on GPG keys
- October: Regular test migrations happening, based on 2014/09/15 snapshot:
See also
- Grafting converted CVS commits to the new gentoo.git repository - Instructions for grafting (converted) historical CVS commits with gentoo.git history. Helpful for if you want to look through the full git log.
This article is issued from Gentoo. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.