To check out this repository please hg clone the following URL, or open the URL using EasyMercurial or your preferred Mercurial client.

Statistics Download as Zip
| Branch: | Tag: | Revision:

root / extra / fast-export @ 1567:3ad53f43483d

1 1544:e9e55585ebf2 chris
svn-archive
2
svn-fast-export
3
*.pyc
4
.dotest
5
hg-fast-export.(sh|py) - mercurial to git converter using git-fast-import
6
=========================================================================
7
8
Legal
9
-----
10
11
Most hg-* scripts are licensed under the [MIT license]
12
(http://www.opensource.org/licenses/mit-license.php) and were written
13
by Rocco Rutte <pdmef@gmx.net> with hints and help from the git list and
14
\#mercurial on freenode. hg-reset.py is licensed under GPLv2 since it
15
copies some code from the mercurial sources.
16
17
The current maintainer is Frej Drejhammar <frej.drejhammar@gmail.com>.
18
19
Usage
20
-----
21
22
Using hg-fast-export is quite simple for a mercurial repository <repo>:
23
24
```
25
mkdir repo-git # or whatever
26
cd repo-git
27
git init
28
hg-fast-export.sh -r <repo>
29
```
30
31
Please note that hg-fast-export does not automatically check out the
32
newly imported repository. You probably want to follow up the import
33
with a `git checkout`-command.
34
35
Incremental imports to track hg repos is supported, too.
36
37
Using hg-reset it is quite simple within a git repository that is
38
hg-fast-export'ed from mercurial:
39
40
```
41
hg-reset.sh -R <revision>
42
```
43
44
will give hints on which branches need adjustment for starting over
45
again.
46
47
When a mercurial repository does not use utf-8 for encoding author
48
strings and commit messages the `-e <encoding>` command line option
49
can be used to force fast-export to convert incoming meta data from
50
<encoding> to utf-8. This encoding option is also applied to file names.
51
52
In some locales Mercurial uses different encodings for commit messages
53
and file names. In that case, you can use `--fe <encoding>` command line
54
option which overrides the -e option for file names.
55
56
As mercurial appears to be much less picky about the syntax of the
57
author information than git, an author mapping file can be given to
58
hg-fast-export to fix up malformed author strings. The file is
59
specified using the -A option. The file should contain lines of the
60
form `FromAuthor=ToAuthor`. The example authors.map below will
61
translate `User <garbage<user@example.com>` to `User <user@example.com>`.
62
63
```
64
-- Start of authors.map --
65
User <garbage<user@example.com>=User <user@example.com>
66
-- End of authors.map --
67
```
68
69
Tag and Branch Naming
70
---------------------
71
72
As Git and Mercurial have differ in what is a valid branch and tag
73
name the -B and -T options allow a mapping file to be specified to
74
rename branches and tags (respectively). The syntax of the mapping
75
file is the same as for the author mapping.
76
77
Notes/Limitations
78
-----------------
79
80
hg-fast-export supports multiple branches but only named branches with
81
exactly one head each. Otherwise commits to the tip of these heads
82
within the branch will get flattened into merge commits.
83
84
As each git-fast-import run creates a new pack file, it may be
85
required to repack the repository quite often for incremental imports
86
(especially when importing a small number of changesets per
87
incremental import).
88
89
The way the hg API and remote access protocol is designed it is not
90
possible to use hg-fast-export on remote repositories
91
(http/ssh). First clone the repository, then convert it.
92
93
Design
94
------
95
96
hg-fast-export.py was designed in a way that doesn't require a 2-pass
97
mechanism or any prior repository analysis: if just feeds what it
98
finds into git-fast-import. This also implies that it heavily relies
99
on strictly linear ordering of changesets from hg, i.e. its
100
append-only storage model so that changesets hg-fast-export already
101
saw never get modified.
102
103
Submitting Patches
104
------------------
105
106
Please use the issue-tracker at github
107
https://github.com/frej/fast-export to report bugs and submit
108
patches.
109
#!/usr/bin/env python
110
111
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
112
# License: MIT <http://www.opensource.org/licenses/mit-license.php>
113
114
from mercurial import node
115
from hg2git import setup_repo,fixup_user,get_branch,get_changeset
116
from hg2git import load_cache,save_cache,get_git_sha1,set_default_branch,set_origin_name
117
from optparse import OptionParser
118
import re
119
import sys
120
import os
121
122
if sys.platform == "win32":
123
  # On Windows, sys.stdout is initially opened in text mode, which means that
124
  # when a LF (\n) character is written to sys.stdout, it will be converted
125
  # into CRLF (\r\n).  That makes git blow up, so use this platform-specific
126
  # code to change the mode of sys.stdout to binary.
127
  import msvcrt
128
  msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
129
130
# silly regex to catch Signed-off-by lines in log message
131
sob_re=re.compile('^Signed-[Oo]ff-[Bb]y: (.+)$')
132
# insert 'checkpoint' command after this many commits or none at all if 0
133
cfg_checkpoint_count=0
134
# write some progress message every this many file contents written
135
cfg_export_boundary=1000
136
137
def gitmode(flags):
138
  return 'l' in flags and '120000' or 'x' in flags and '100755' or '100644'
139
140
def wr_no_nl(msg=''):
141
  if msg:
142
    sys.stdout.write(msg)
143
144
def wr(msg=''):
145
  wr_no_nl(msg)
146
  sys.stdout.write('\n')
147
  #map(lambda x: sys.stderr.write('\t[%s]\n' % x),msg.split('\n'))
148
149
def checkpoint(count):
150
  count=count+1
151
  if cfg_checkpoint_count>0 and count%cfg_checkpoint_count==0:
152
    sys.stderr.write("Checkpoint after %d commits\n" % count)
153
    wr('checkpoint')
154
    wr()
155
  return count
156
157
def revnum_to_revref(rev, old_marks):
158
  """Convert an hg revnum to a git-fast-import rev reference (an SHA1
159
  or a mark)"""
160
  return old_marks.get(rev) or ':%d' % (rev+1)
161
162
def file_mismatch(f1,f2):
163
  """See if two revisions of a file are not equal."""
164
  return node.hex(f1)!=node.hex(f2)
165
166
def split_dict(dleft,dright,l=[],c=[],r=[],match=file_mismatch):
167
  """Loop over our repository and find all changed and missing files."""
168
  for left in dleft.keys():
169
    right=dright.get(left,None)
170
    if right==None:
171
      # we have the file but our parent hasn't: add to left set
172
      l.append(left)
173
    elif match(dleft[left],right) or gitmode(dleft.flags(left))!=gitmode(dright.flags(left)):
174
      # we have it but checksums mismatch: add to center set
175
      c.append(left)
176
  for right in dright.keys():
177
    left=dleft.get(right,None)
178
    if left==None:
179
      # if parent has file but we don't: add to right set
180
      r.append(right)
181
    # change is already handled when comparing child against parent
182
  return l,c,r
183
184
def get_filechanges(repo,revision,parents,mleft):
185
  """Given some repository and revision, find all changed/deleted files."""
186
  l,c,r=[],[],[]
187
  for p in parents:
188
    if p<0: continue
189
    mright=repo.changectx(p).manifest()
190
    l,c,r=split_dict(mleft,mright,l,c,r)
191
  l.sort()
192
  c.sort()
193
  r.sort()
194
  return l,c,r
195
196
def get_author(logmessage,committer,authors):
197
  """As git distincts between author and committer of a patch, try to
198
  extract author by detecting Signed-off-by lines.
199
200
  This walks from the end of the log message towards the top skipping
201
  empty lines. Upon the first non-empty line, it walks all Signed-off-by
202
  lines upwards to find the first one. For that (if found), it extracts
203
  authorship information the usual way (authors table, cleaning, etc.)
204
205
  If no Signed-off-by line is found, this defaults to the committer.
206
207
  This may sound stupid (and it somehow is), but in log messages we
208
  accidentially may have lines in the middle starting with
209
  "Signed-off-by: foo" and thus matching our detection regex. Prevent
210
  that."""
211
212
  loglines=logmessage.split('\n')
213
  i=len(loglines)
214
  # from tail walk to top skipping empty lines
215
  while i>=0:
216
    i-=1
217
    if len(loglines[i].strip())==0: continue
218
    break
219
  if i>=0:
220
    # walk further upwards to find first sob line, store in 'first'
221
    first=None
222
    while i>=0:
223
      m=sob_re.match(loglines[i])
224
      if m==None: break
225
      first=m
226
      i-=1
227
    # if the last non-empty line matches our Signed-Off-by regex: extract username
228
    if first!=None:
229
      r=fixup_user(first.group(1),authors)
230
      return r
231
  return committer
232
233
def export_file_contents(ctx,manifest,files,hgtags,encoding=''):
234
  count=0
235
  max=len(files)
236
  for file in files:
237
    # Skip .hgtags files. They only get us in trouble.
238
    if not hgtags and file == ".hgtags":
239
      sys.stderr.write('Skip %s\n' % (file))
240
      continue
241
    d=ctx.filectx(file).data()
242
    if encoding:
243
      filename=file.decode(encoding).encode('utf8')
244
    else:
245
      filename=file
246
    wr('M %s inline %s' % (gitmode(manifest.flags(file)),
247
                           strip_leading_slash(filename)))
248
    wr('data %d' % len(d)) # had some trouble with size()
249
    wr(d)
250
    count+=1
251
    if count%cfg_export_boundary==0:
252
      sys.stderr.write('Exported %d/%d files\n' % (count,max))
253
  if max>cfg_export_boundary:
254
    sys.stderr.write('Exported %d/%d files\n' % (count,max))
255
256 1567:3ad53f43483d Chris
def sanitize_name(name,what="branch", mapping={}):
257 1544:e9e55585ebf2 chris
  """Sanitize input roughly according to git-check-ref-format(1)"""
258
259 1567:3ad53f43483d Chris
  # NOTE: Do not update this transform to work around
260
  # incompatibilities on your platform. If you change it and it starts
261
  # modifying names which previously were not touched it will break
262
  # preexisting setups which are doing incremental imports.
263
  #
264
  # Use the -B and -T options to mangle branch and tag names
265
  # instead. If you have a source repository where this is too much
266
  # work to do manually, write a tool that does it for you.
267
268 1544:e9e55585ebf2 chris
  def dot(name):
269
    if name[0] == '.': return '_'+name[1:]
270
    return name
271
272 1567:3ad53f43483d Chris
  n=mapping.get(name,name)
273 1544:e9e55585ebf2 chris
  p=re.compile('([[ ~^:?\\\\*]|\.\.)')
274
  n=p.sub('_', n)
275
  if n[-1] in ('/', '.'): n=n[:-1]+'_'
276
  n='/'.join(map(dot,n.split('/')))
277
  p=re.compile('_+')
278
  n=p.sub('_', n)
279
280
  if n!=name:
281
    sys.stderr.write('Warning: sanitized %s [%s] to [%s]\n' % (what,name,n))
282
  return n
283
284
def strip_leading_slash(filename):
285
  if filename[0] == '/':
286
    return filename[1:]
287
  return filename
288
289
def export_commit(ui,repo,revision,old_marks,max,count,authors,
290 1567:3ad53f43483d Chris
                  branchesmap,sob,brmap,hgtags,encoding='',fn_encoding=''):
291 1544:e9e55585ebf2 chris
  def get_branchname(name):
292
    if brmap.has_key(name):
293
      return brmap[name]
294 1567:3ad53f43483d Chris
    n=sanitize_name(name, "branch", branchesmap)
295 1544:e9e55585ebf2 chris
    brmap[name]=n
296
    return n
297
298
  (revnode,_,user,(time,timezone),files,desc,branch,_)=get_changeset(ui,repo,revision,authors,encoding)
299
300
  branch=get_branchname(branch)
301
302
  parents = [p for p in repo.changelog.parentrevs(revision) if p >= 0]
303
304
  if len(parents)==0 and revision != 0:
305
    wr('reset refs/heads/%s' % branch)
306
307
  wr('commit refs/heads/%s' % branch)
308
  wr('mark :%d' % (revision+1))
309
  if sob:
310
    wr('author %s %d %s' % (get_author(desc,user,authors),time,timezone))
311
  wr('committer %s %d %s' % (user,time,timezone))
312
  wr('data %d' % (len(desc)+1)) # wtf?
313
  wr(desc)
314
  wr()
315
316
  ctx=repo.changectx(str(revision))
317
  man=ctx.manifest()
318
  added,changed,removed,type=[],[],[],''
319
320
  if len(parents) == 0:
321
    # first revision: feed in full manifest
322
    added=man.keys()
323
    added.sort()
324
    type='full'
325
  else:
326
    wr('from %s' % revnum_to_revref(parents[0], old_marks))
327
    if len(parents) == 1:
328
      # later non-merge revision: feed in changed manifest
329
      # if we have exactly one parent, just take the changes from the
330
      # manifest without expensively comparing checksums
331
      f=repo.status(repo.lookup(parents[0]),revnode)[:3]
332
      added,changed,removed=f[1],f[0],f[2]
333
      type='simple delta'
334
    else: # a merge with two parents
335
      wr('merge %s' % revnum_to_revref(parents[1], old_marks))
336
      # later merge revision: feed in changed manifest
337
      # for many files comparing checksums is expensive so only do it for
338
      # merges where we really need it due to hg's revlog logic
339
      added,changed,removed=get_filechanges(repo,revision,parents,man)
340
      type='thorough delta'
341
342
  sys.stderr.write('%s: Exporting %s revision %d/%d with %d/%d/%d added/changed/removed files\n' %
343
      (branch,type,revision+1,max,len(added),len(changed),len(removed)))
344
345
  if fn_encoding:
346
    removed=[r.decode(fn_encoding).encode('utf8') for r in removed]
347
348
  removed=[strip_leading_slash(x) for x in removed]
349
350
  map(lambda r: wr('D %s' % r),removed)
351
  export_file_contents(ctx,man,added,hgtags,fn_encoding)
352
  export_file_contents(ctx,man,changed,hgtags,fn_encoding)
353
  wr()
354
355 1567:3ad53f43483d Chris
  return checkpoint(count)
356 1544:e9e55585ebf2 chris
357 1567:3ad53f43483d Chris
def export_note(ui,repo,revision,count,authors,encoding,is_first):
358
  (revnode,_,user,(time,timezone),_,_,_,_)=get_changeset(ui,repo,revision,authors,encoding)
359
360
  parents = [p for p in repo.changelog.parentrevs(revision) if p >= 0]
361
362 1544:e9e55585ebf2 chris
  wr('commit refs/notes/hg')
363
  wr('committer %s %d %s' % (user,time,timezone))
364
  wr('data 0')
365 1567:3ad53f43483d Chris
  if is_first:
366
    wr('from refs/notes/hg^0')
367 1544:e9e55585ebf2 chris
  wr('N inline :%d' % (revision+1))
368 1567:3ad53f43483d Chris
  hg_hash=repo.changectx(str(revision)).hex()
369 1544:e9e55585ebf2 chris
  wr('data %d' % (len(hg_hash)))
370
  wr_no_nl(hg_hash)
371
  wr()
372
  return checkpoint(count)
373 1567:3ad53f43483d Chris
374
  wr('data %d' % (len(desc)+1)) # wtf?
375
  wr(desc)
376
  wr()
377
378 1544:e9e55585ebf2 chris
def export_tags(ui,repo,old_marks,mapping_cache,count,authors,tagsmap):
379
  l=repo.tagslist()
380
  for tag,node in l:
381
    # Remap the branch name
382 1567:3ad53f43483d Chris
    tag=sanitize_name(tag,"tag",tagsmap)
383 1544:e9e55585ebf2 chris
    # ignore latest revision
384
    if tag=='tip': continue
385
    # ignore tags to nodes that are missing (ie, 'in the future')
386
    if node.encode('hex_codec') not in mapping_cache:
387
      sys.stderr.write('Tag %s refers to unseen node %s\n' % (tag, node.encode('hex_codec')))
388
      continue
389
390
    rev=int(mapping_cache[node.encode('hex_codec')])
391
392
    ref=revnum_to_revref(rev, old_marks)
393
    if ref==None:
394
      sys.stderr.write('Failed to find reference for creating tag'
395
          ' %s at r%d\n' % (tag,rev))
396
      continue
397
    sys.stderr.write('Exporting tag [%s] at [hg r%d] [git %s]\n' % (tag,rev,ref))
398
    wr('reset refs/tags/%s' % tag)
399
    wr('from %s' % ref)
400
    wr()
401
    count=checkpoint(count)
402
  return count
403
404
def load_mapping(name, filename):
405
  cache={}
406
  if not os.path.exists(filename):
407 1567:3ad53f43483d Chris
    sys.stderr.write('Could not open mapping file [%s]\n' % (filename))
408 1544:e9e55585ebf2 chris
    return cache
409
  f=open(filename,'r')
410
  l=0
411
  a=0
412
  lre=re.compile('^([^=]+)[ ]*=[ ]*(.+)$')
413
  for line in f.readlines():
414
    l+=1
415
    line=line.strip()
416
    if line=='' or line[0]=='#':
417
      continue
418
    m=lre.match(line)
419
    if m==None:
420
      sys.stderr.write('Invalid file format in [%s], line %d\n' % (filename,l))
421
      continue
422
    # put key:value in cache, key without ^:
423
    cache[m.group(1).strip()]=m.group(2).strip()
424
    a+=1
425
  f.close()
426
  sys.stderr.write('Loaded %d %s\n' % (a, name))
427
  return cache
428
429
def branchtip(repo, heads):
430
  '''return the tipmost branch head in heads'''
431
  tip = heads[-1]
432
  for h in reversed(heads):
433
    if 'close' not in repo.changelog.read(h)[5]:
434
      tip = h
435
      break
436
  return tip
437
438 1567:3ad53f43483d Chris
def verify_heads(ui,repo,cache,force,branchesmap):
439 1544:e9e55585ebf2 chris
  branches={}
440
  for bn, heads in repo.branchmap().iteritems():
441
    branches[bn] = branchtip(repo, heads)
442
  l=[(-repo.changelog.rev(n), n, t) for t, n in branches.items()]
443
  l.sort()
444
445
  # get list of hg's branches to verify, don't take all git has
446
  for _,_,b in l:
447
    b=get_branch(b)
448 1567:3ad53f43483d Chris
    sanitized_name=sanitize_name(b,"branch",branchesmap)
449
    sha1=get_git_sha1(sanitized_name)
450
    c=cache.get(sanitized_name)
451 1544:e9e55585ebf2 chris
    if sha1!=c:
452
      sys.stderr.write('Error: Branch [%s] modified outside hg-fast-export:'
453
        '\n%s (repo) != %s (cache)\n' % (b,sha1,c))
454
      if not force: return False
455
456
  # verify that branch has exactly one head
457
  t={}
458
  for h in repo.heads():
459
    (_,_,_,_,_,_,branch,_)=get_changeset(ui,repo,h)
460
    if t.get(branch,False):
461
      sys.stderr.write('Error: repository has at least one unnamed head: hg r%s\n' %
462
          repo.changelog.rev(h))
463
      if not force: return False
464
    t[branch]=True
465
466
  return True
467
468
def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
469
           authors={},branchesmap={},tagsmap={},
470
           sob=False,force=False,hgtags=False,notes=False,encoding='',fn_encoding=''):
471 1567:3ad53f43483d Chris
  def check_cache(filename, contents):
472
    if len(contents) == 0:
473
      sys.stderr.write('Warning: %s does not contain any data, this will probably make an incremental import fail\n' % filename)
474
475 1544:e9e55585ebf2 chris
  _max=int(m)
476
477
  old_marks=load_cache(marksfile,lambda s: int(s)-1)
478
  mapping_cache=load_cache(mappingfile)
479
  heads_cache=load_cache(headsfile)
480
  state_cache=load_cache(tipfile)
481
482 1567:3ad53f43483d Chris
  if len(state_cache) != 0:
483
    for (name, data) in [(marksfile, old_marks),
484
                         (mappingfile, mapping_cache),
485
                         (headsfile, state_cache)]:
486
      check_cache(name, data)
487
488 1544:e9e55585ebf2 chris
  ui,repo=setup_repo(repourl)
489
490 1567:3ad53f43483d Chris
  if not verify_heads(ui,repo,heads_cache,force,branchesmap):
491 1544:e9e55585ebf2 chris
    return 1
492
493
  try:
494
    tip=repo.changelog.count()
495
  except AttributeError:
496
    tip=len(repo)
497
498
  min=int(state_cache.get('tip',0))
499
  max=_max
500
  if _max<0 or max>tip:
501
    max=tip
502
503
  for rev in range(0,max):
504
  	(revnode,_,_,_,_,_,_,_)=get_changeset(ui,repo,rev,authors)
505
  	mapping_cache[revnode.encode('hex_codec')] = str(rev)
506
507
508
  c=0
509
  brmap={}
510
  for rev in range(min,max):
511
    c=export_commit(ui,repo,rev,old_marks,max,c,authors,branchesmap,
512 1567:3ad53f43483d Chris
                    sob,brmap,hgtags,encoding,fn_encoding)
513
  if notes:
514
    for rev in range(min,max):
515
      c=export_note(ui,repo,rev,c,authors, encoding, rev == min and min != 0)
516 1544:e9e55585ebf2 chris
517
  state_cache['tip']=max
518
  state_cache['repo']=repourl
519
  save_cache(tipfile,state_cache)
520
  save_cache(mappingfile,mapping_cache)
521
522
  c=export_tags(ui,repo,old_marks,mapping_cache,c,authors,tagsmap)
523
524
  sys.stderr.write('Issued %d commands\n' % c)
525
526
  return 0
527
528
if __name__=='__main__':
529
  def bail(parser,opt):
530
    sys.stderr.write('Error: No %s option given\n' % opt)
531
    parser.print_help()
532
    sys.exit(2)
533
534
  parser=OptionParser()
535
536
  parser.add_option("-m","--max",type="int",dest="max",
537
      help="Maximum hg revision to import")
538
  parser.add_option("--mapping",dest="mappingfile",
539
      help="File to read last run's hg-to-git SHA1 mapping")
540
  parser.add_option("--marks",dest="marksfile",
541
      help="File to read git-fast-import's marks from")
542
  parser.add_option("--heads",dest="headsfile",
543
      help="File to read last run's git heads from")
544
  parser.add_option("--status",dest="statusfile",
545
      help="File to read status from")
546
  parser.add_option("-r","--repo",dest="repourl",
547
      help="URL of repo to import")
548
  parser.add_option("-s",action="store_true",dest="sob",
549
      default=False,help="Enable parsing Signed-off-by lines")
550
  parser.add_option("--hgtags",action="store_true",dest="hgtags",
551
      default=False,help="Enable exporting .hgtags files")
552
  parser.add_option("-A","--authors",dest="authorfile",
553
      help="Read authormap from AUTHORFILE")
554
  parser.add_option("-B","--branches",dest="branchesfile",
555
      help="Read branch map from BRANCHESFILE")
556
  parser.add_option("-T","--tags",dest="tagsfile",
557
      help="Read tags map from TAGSFILE")
558
  parser.add_option("-f","--force",action="store_true",dest="force",
559
      default=False,help="Ignore validation errors by force")
560
  parser.add_option("-M","--default-branch",dest="default_branch",
561
      help="Set the default branch")
562
  parser.add_option("-o","--origin",dest="origin_name",
563
      help="use <name> as namespace to track upstream")
564
  parser.add_option("--hg-hash",action="store_true",dest="notes",
565
      default=False,help="Annotate commits with the hg hash as git notes in the hg namespace")
566
  parser.add_option("-e",dest="encoding",
567
      help="Assume commit and author strings retrieved from Mercurial are encoded in <encoding>")
568
  parser.add_option("--fe",dest="fn_encoding",
569
      help="Assume file names from Mercurial are encoded in <filename_encoding>")
570
571
  (options,args)=parser.parse_args()
572
573
  m=-1
574
  if options.max!=None: m=options.max
575
576
  if options.marksfile==None: bail(parser,'--marks')
577
  if options.mappingfile==None: bail(parser,'--mapping')
578
  if options.headsfile==None: bail(parser,'--heads')
579
  if options.statusfile==None: bail(parser,'--status')
580
  if options.repourl==None: bail(parser,'--repo')
581
582
  a={}
583
  if options.authorfile!=None:
584
    a=load_mapping('authors', options.authorfile)
585
586
  b={}
587
  if options.branchesfile!=None:
588
    b=load_mapping('branches', options.branchesfile)
589
590
  t={}
591
  if options.tagsfile!=None:
592
    t=load_mapping('tags', options.tagsfile)
593
594
  if options.default_branch!=None:
595
    set_default_branch(options.default_branch)
596
597
  if options.origin_name!=None:
598
    set_origin_name(options.origin_name)
599
600
  encoding=''
601
  if options.encoding!=None:
602
    encoding=options.encoding
603
604
  fn_encoding=encoding
605
  if options.fn_encoding!=None:
606
    fn_encoding=options.fn_encoding
607
608
  sys.exit(hg2git(options.repourl,m,options.marksfile,options.mappingfile,
609
                  options.headsfile, options.statusfile,
610
                  authors=a,branchesmap=b,tagsmap=t,
611
                  sob=options.sob,force=options.force,hgtags=options.hgtags,
612
                  notes=options.notes,encoding=encoding,fn_encoding=fn_encoding))
613
#!/bin/sh
614
615
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
616
# License: MIT <http://www.opensource.org/licenses/mit-license.php>
617
618
ROOT="$(dirname "$(which "$0")")"
619
REPO=""
620
PFX="hg2git"
621
SFX_MAPPING="mapping"
622
SFX_MARKS="marks"
623
SFX_HEADS="heads"
624
SFX_STATE="state"
625
GFI_OPTS=""
626
PYTHON=${PYTHON:-python}
627
628
USAGE="[--quiet] [-r <repo>] [--force] [-m <max>] [-s] [--hgtags] [-A <file>] [-B <file>] [-T <file>] [-M <name>] [-o <name>] [--hg-hash] [-e <encoding>]"
629
LONG_USAGE="Import hg repository <repo> up to either tip or <max>
630
If <repo> is omitted, use last hg repository as obtained from state file,
631
GIT_DIR/$PFX-$SFX_STATE by default.
632
633
Note: The argument order matters.
634
635
Options:
636
	--quiet   Passed to git-fast-import(1)
637
	-r <repo> Mercurial repository to import
638
	--force   Ignore validation errors when converting, and pass --force
639
	          to git-fast-import(1)
640
	-m <max>  Maximum revision to import
641
	-s        Enable parsing Signed-off-by lines
642
	--hgtags  Enable exporting .hgtags files
643
	-A <file> Read author map from file
644
	          (Same as in git-svnimport(1) and git-cvsimport(1))
645
	-B <file> Read branch map from file
646
	-T <file> Read tags map from file
647
	-M <name> Set the default branch name (defaults to 'master')
648
	-o <name> Use <name> as branch namespace to track upstream (eg 'origin')
649
	--hg-hash Annotate commits with the hg hash as git notes in the
650
                  hg namespace.
651
	-e <encoding> Assume commit and author strings retrieved from
652
	              Mercurial are encoded in <encoding>
653
	--fe <filename_encoding> Assume filenames from Mercurial are encoded
654
	                         in <filename_encoding>
655
"
656
case "$1" in
657
    -h|--help)
658
      echo "usage: $(basename "$0") $USAGE"
659
      echo ""
660
      echo "$LONG_USAGE"
661
      exit 0
662
esac
663 1567:3ad53f43483d Chris
664
IS_BARE=$(git rev-parse --is-bare-repository) \
665
    || (echo "Could not find git repo" ; exit 1)
666
if test "z$IS_BARE" != ztrue; then
667
   # This is not a bare repo, cd to the toplevel
668
   TOPLEVEL=$(git rev-parse --show-toplevel) \
669
       || (echo "Could not find git repo toplevel" ; exit 1)
670
   cd $TOPLEVEL || exit 1
671
fi
672
GIT_DIR=$(git rev-parse --git-dir) || (echo "Could not find git repo" ; exit 1)
673 1544:e9e55585ebf2 chris
674
while case "$#" in 0) break ;; esac
675
do
676
  case "$1" in
677
    -r|--r|--re|--rep|--repo)
678
      shift
679
      REPO="$1"
680
      ;;
681
    --q|--qu|--qui|--quie|--quiet)
682
      GFI_OPTS="$GFI_OPTS --quiet"
683
      ;;
684
    --force)
685
      # pass --force to git-fast-import and hg-fast-export.py
686
      GFI_OPTS="$GFI_OPTS --force"
687
      break
688
      ;;
689
    -*)
690
      # pass any other options down to hg2git.py
691
      break
692
      ;;
693
    *)
694
      break
695
      ;;
696
  esac
697
  shift
698
done
699
700
# for convenience: get default repo from state file
701
if [ x"$REPO" = x -a -f "$GIT_DIR/$PFX-$SFX_STATE" ] ; then
702
  REPO="`grep '^:repo ' "$GIT_DIR/$PFX-$SFX_STATE" | cut -d ' ' -f 2`"
703
  echo "Using last hg repository \"$REPO\""
704
fi
705
706
if [  -z "$REPO" ]; then
707
    echo "no repo given, use -r flag"
708
    exit 1
709
fi
710
711
# make sure we have a marks cache
712
if [ ! -f "$GIT_DIR/$PFX-$SFX_MARKS" ] ; then
713
  touch "$GIT_DIR/$PFX-$SFX_MARKS"
714
fi
715
716
# cleanup on exit
717
trap 'rm -f "$GIT_DIR/$PFX-$SFX_MARKS.old" "$GIT_DIR/$PFX-$SFX_MARKS.tmp"' 0
718
719
_err1=
720
_err2=
721
exec 3>&1
722
{ read -r _err1 || :; read -r _err2 || :; } <<-EOT
723
$(
724
  exec 4>&3 3>&1 1>&4 4>&-
725
  {
726
    _e1=0
727
    GIT_DIR="$GIT_DIR" $PYTHON "$ROOT/hg-fast-export.py" \
728
      --repo "$REPO" \
729
      --marks "$GIT_DIR/$PFX-$SFX_MARKS" \
730
      --mapping "$GIT_DIR/$PFX-$SFX_MAPPING" \
731
      --heads "$GIT_DIR/$PFX-$SFX_HEADS" \
732
      --status "$GIT_DIR/$PFX-$SFX_STATE" \
733
      "$@" 3>&- || _e1=$?
734
    echo $_e1 >&3
735
  } | \
736
  {
737
    _e2=0
738
    git fast-import $GFI_OPTS --export-marks="$GIT_DIR/$PFX-$SFX_MARKS.tmp" 3>&- || _e2=$?
739
    echo $_e2 >&3
740
  }
741
)
742
EOT
743
exec 3>&-
744
[ "$_err1" = 0 -a "$_err2" = 0 ] || exit 1
745
746
# move recent marks cache out of the way...
747
if [ -f "$GIT_DIR/$PFX-$SFX_MARKS" ] ; then
748
  mv "$GIT_DIR/$PFX-$SFX_MARKS" "$GIT_DIR/$PFX-$SFX_MARKS.old"
749
else
750
  touch "$GIT_DIR/$PFX-$SFX_MARKS.old"
751
fi
752
753
# ...to create a new merged one
754
cat "$GIT_DIR/$PFX-$SFX_MARKS.old" "$GIT_DIR/$PFX-$SFX_MARKS.tmp" \
755
| uniq > "$GIT_DIR/$PFX-$SFX_MARKS"
756
757
# save SHA1s of current heads for incremental imports
758
# and connectivity (plus sanity checking)
759
for head in `git branch | sed 's#^..##'` ; do
760
  id="`git rev-parse refs/heads/$head`"
761
  echo ":$head $id"
762
done > "$GIT_DIR/$PFX-$SFX_HEADS"
763
764
# check diff with color:
765
# ( for i in `find . -type f | grep -v '\.git'` ; do diff -u $i $REPO/$i ; done | cdiff ) | less -r
766
#!/usr/bin/env python
767
768
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
769
# License: GPLv2
770
771
from mercurial import node
772
from hg2git import setup_repo,load_cache,get_changeset,get_git_sha1
773
from optparse import OptionParser
774
import sys
775
776
def heads(ui,repo,start=None,stop=None,max=None):
777
  # this is copied from mercurial/revlog.py and differs only in
778
  # accepting a max argument for xrange(startrev+1,...) defaulting
779
  # to the original repo.changelog.count()
780
  if start is None:
781
    start = node.nullid
782
  if stop is None:
783
    stop = []
784
  if max is None:
785
    max = repo.changelog.count()
786
  stoprevs = dict.fromkeys([repo.changelog.rev(n) for n in stop])
787
  startrev = repo.changelog.rev(start)
788
  reachable = {startrev: 1}
789
  heads = {startrev: 1}
790
791
  parentrevs = repo.changelog.parentrevs
792
  for r in xrange(startrev + 1, max):
793
    for p in parentrevs(r):
794
      if p in reachable:
795
        if r not in stoprevs:
796
          reachable[r] = 1
797
        heads[r] = 1
798
      if p in heads and p not in stoprevs:
799
        del heads[p]
800
801
  return [(repo.changelog.node(r),str(r)) for r in heads]
802
803
def get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,max):
804
  h=heads(ui,repo,max=max)
805
  stale=dict.fromkeys(heads_cache)
806
  changed=[]
807
  unchanged=[]
808
  for node,rev in h:
809
    _,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev)
810
    del stale[branch]
811
    git_sha1=get_git_sha1(branch)
812
    cache_sha1=marks_cache.get(str(int(rev)+1))
813
    if git_sha1!=None and git_sha1==cache_sha1:
814
      unchanged.append([branch,cache_sha1,rev,desc.split('\n')[0],user])
815
    else:
816
      changed.append([branch,cache_sha1,rev,desc.split('\n')[0],user])
817
  changed.sort()
818
  unchanged.sort()
819
  return stale,changed,unchanged
820
821
def get_tags(ui,repo,marks_cache,mapping_cache,max):
822
  l=repo.tagslist()
823
  good,bad=[],[]
824
  for tag,node in l:
825
    if tag=='tip': continue
826
    rev=int(mapping_cache[node.encode('hex_codec')])
827
    cache_sha1=marks_cache.get(str(int(rev)+1))
828
    _,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev)
829
    if int(rev)>int(max):
830
      bad.append([tag,branch,cache_sha1,rev,desc.split('\n')[0],user])
831
    else:
832
      good.append([tag,branch,cache_sha1,rev,desc.split('\n')[0],user])
833
  good.sort()
834
  bad.sort()
835
  return good,bad
836
837
def mangle_mark(mark):
838
  return str(int(mark)-1)
839
840
if __name__=='__main__':
841
  def bail(parser,opt):
842
    sys.stderr.write('Error: No option %s given\n' % opt)
843
    parser.print_help()
844
    sys.exit(2)
845
846
  parser=OptionParser()
847
848
  parser.add_option("--marks",dest="marksfile",
849
      help="File to read git-fast-import's marks from")
850
  parser.add_option("--mapping",dest="mappingfile",
851
      help="File to read last run's hg-to-git SHA1 mapping")
852
  parser.add_option("--heads",dest="headsfile",
853
      help="File to read last run's git heads from")
854
  parser.add_option("--status",dest="statusfile",
855
      help="File to read status from")
856
  parser.add_option("-r","--repo",dest="repourl",
857
      help="URL of repo to import")
858
  parser.add_option("-R","--revision",type=int,dest="revision",
859
      help="Revision to reset to")
860
861
  (options,args)=parser.parse_args()
862
863
  if options.marksfile==None: bail(parser,'--marks option')
864
  if options.mappingfile==None: bail(parser,'--mapping option')
865
  if options.headsfile==None: bail(parser,'--heads option')
866
  if options.statusfile==None: bail(parser,'--status option')
867
  if options.repourl==None: bail(parser,'--repo option')
868
  if options.revision==None: bail(parser,'-R/--revision')
869
870
  heads_cache=load_cache(options.headsfile)
871
  marks_cache=load_cache(options.marksfile,mangle_mark)
872
  state_cache=load_cache(options.statusfile)
873
  mapping_cache = load_cache(options.mappingfile)
874
875
  l=int(state_cache.get('tip',options.revision))
876
  if options.revision+1>l:
877
    sys.stderr.write('Revision is beyond last revision imported: %d>%d\n' % (options.revision,l))
878
    sys.exit(1)
879
880
  ui,repo=setup_repo(options.repourl)
881
882
  stale,changed,unchanged=get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,options.revision+1)
883
  good,bad=get_tags(ui,repo,marks_cache,mapping_cache,options.revision+1)
884
885
  print "Possibly stale branches:"
886
  map(lambda b: sys.stdout.write('\t%s\n' % b),stale.keys())
887
888
  print "Possibly stale tags:"
889
  map(lambda b: sys.stdout.write('\t%s on %s (r%s)\n' % (b[0],b[1],b[3])),bad)
890
891
  print "Unchanged branches:"
892
  map(lambda b: sys.stdout.write('\t%s (r%s)\n' % (b[0],b[2])),unchanged)
893
894
  print "Unchanged tags:"
895
  map(lambda b: sys.stdout.write('\t%s on %s (r%s)\n' % (b[0],b[1],b[3])),good)
896
897
  print "Reset branches in '%s' to:" % options.headsfile
898
  map(lambda b: sys.stdout.write('\t:%s %s\n\t\t(r%s: %s: %s)\n' % (b[0],b[1],b[2],b[4],b[3])),changed)
899
900
  print "Reset ':tip' in '%s' to '%d'" % (options.statusfile,options.revision)
901
#!/bin/sh
902
903
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
904
# License: MIT <http://www.opensource.org/licenses/mit-license.php>
905
906
ROOT="`dirname $0`"
907
REPO=""
908
PFX="hg2git"
909
SFX_MARKS="marks"
910
SFX_MAPPING="mapping"
911
SFX_HEADS="heads"
912
SFX_STATE="state"
913
QUIET=""
914
PYTHON=${PYTHON:-python}
915
916
USAGE="[-r <repo>] -R <rev>"
917
LONG_USAGE="Print SHA1s of latest changes per branch up to <rev> useful
918
to reset import and restart at <rev>.
919
If <repo> is omitted, use last hg repository as obtained from state file,
920
GIT_DIR/$PFX-$SFX_STATE by default.
921
922
Options:
923
	-R	Hg revision to reset to
924
	-r	Mercurial repository to use
925
"
926
927 1567:3ad53f43483d Chris
IS_BARE=$(git rev-parse --is-bare-repository) \
928
    || (echo "Could not find git repo" ; exit 1)
929
if test "z$IS_BARE" != ztrue; then
930
   # This is not a bare repo, cd to the toplevel
931
   TOPLEVEL=$(git rev-parse --show-toplevel) \
932
       || (echo "Could not find git repo toplevel" ; exit 1)
933
   cd $TOPLEVEL || exit 1
934
fi
935
GIT_DIR=$(git rev-parse --git-dir) || (echo "Could not find git repo" ; exit 1)
936 1544:e9e55585ebf2 chris
937
while case "$#" in 0) break ;; esac
938
do
939
  case "$1" in
940
    -r|--r|--re|--rep|--repo)
941
      shift
942
      REPO="$1"
943
      ;;
944
    -*)
945
      # pass any other options down to hg2git.py
946
      break
947
      ;;
948
    *)
949
      break
950
      ;;
951
  esac
952
  shift
953
done
954
955
# for convenience: get default repo from state file
956
if [ x"$REPO" = x -a -f "$GIT_DIR/$PFX-$SFX_STATE" ] ; then
957
  REPO="`grep '^:repo ' "$GIT_DIR/$PFX-$SFX_STATE" | cut -d ' ' -f 2`"
958
  echo "Using last hg repository \"$REPO\""
959
fi
960
961
# make sure we have a marks cache
962
if [ ! -f "$GIT_DIR/$PFX-$SFX_MARKS" ] ; then
963
  touch "$GIT_DIR/$PFX-$SFX_MARKS"
964
fi
965
966
GIT_DIR="$GIT_DIR" $PYTHON "$ROOT/hg-reset.py" \
967
  --repo "$REPO" \
968
  --marks "$GIT_DIR/$PFX-$SFX_MARKS" \
969
  --mapping "$GIT_DIR/$PFX-$SFX_MAPPING" \
970
  --heads "$GIT_DIR/$PFX-$SFX_HEADS" \
971
  --status "$GIT_DIR/$PFX-$SFX_STATE" \
972
  "$@"
973
974
exit $?
975
#!/usr/bin/env python
976
977
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
978
# License: MIT <http://www.opensource.org/licenses/mit-license.php>
979
980
from mercurial import hg,util,ui,templatefilters
981
import re
982
import os
983
import sys
984 1567:3ad53f43483d Chris
import subprocess
985 1544:e9e55585ebf2 chris
986
# default git branch name
987
cfg_master='master'
988
# default origin name
989
origin_name=''
990
# silly regex to see if user field has email address
991
user_re=re.compile('([^<]+) (<[^>]*>)$')
992
# silly regex to clean out user names
993
user_clean_re=re.compile('^["]([^"]+)["]$')
994
995
def set_default_branch(name):
996
  global cfg_master
997
  cfg_master = name
998
999
def set_origin_name(name):
1000
  global origin_name
1001
  origin_name = name
1002
1003
def setup_repo(url):
1004
  try:
1005
    myui=ui.ui(interactive=False)
1006
  except TypeError:
1007
    myui=ui.ui()
1008
    myui.setconfig('ui', 'interactive', 'off')
1009
  return myui,hg.repository(myui,url)
1010
1011
def fixup_user(user,authors):
1012
  user=user.strip("\"")
1013
  if authors!=None:
1014
    # if we have an authors table, try to get mapping
1015
    # by defaulting to the current value of 'user'
1016
    user=authors.get(user,user)
1017
  name,mail,m='','',user_re.match(user)
1018
  if m==None:
1019
    # if we don't have 'Name <mail>' syntax, extract name
1020
    # and mail from hg helpers. this seems to work pretty well.
1021
    # if email doesn't contain @, replace it with devnull@localhost
1022
    name=templatefilters.person(user)
1023
    mail='<%s>' % util.email(user)
1024
    if '@' not in mail:
1025
      mail = '<devnull@localhost>'
1026
  else:
1027
    # if we have 'Name <mail>' syntax, everything is fine :)
1028
    name,mail=m.group(1),m.group(2)
1029
1030
  # remove any silly quoting from username
1031
  m2=user_clean_re.match(name)
1032
  if m2!=None:
1033
    name=m2.group(1)
1034
  return '%s %s' % (name,mail)
1035
1036
def get_branch(name):
1037
  # 'HEAD' is the result of a bug in mutt's cvs->hg conversion,
1038
  # other CVS imports may need it, too
1039
  if name=='HEAD' or name=='default' or name=='':
1040
    name=cfg_master
1041
  if origin_name:
1042
    return origin_name + '/' + name
1043
  return name
1044
1045
def get_changeset(ui,repo,revision,authors={},encoding=''):
1046
  node=repo.lookup(revision)
1047
  (manifest,user,(time,timezone),files,desc,extra)=repo.changelog.read(node)
1048
  if encoding:
1049
    user=user.decode(encoding).encode('utf8')
1050
    desc=desc.decode(encoding).encode('utf8')
1051
  tz="%+03d%02d" % (-timezone / 3600, ((-timezone % 3600) / 60))
1052
  branch=get_branch(extra.get('branch','master'))
1053
  return (node,manifest,fixup_user(user,authors),(time,tz),files,desc,branch,extra)
1054
1055
def mangle_key(key):
1056
  return key
1057
1058
def load_cache(filename,get_key=mangle_key):
1059
  cache={}
1060
  if not os.path.exists(filename):
1061
    return cache
1062
  f=open(filename,'r')
1063
  l=0
1064
  for line in f.readlines():
1065
    l+=1
1066
    fields=line.split(' ')
1067
    if fields==None or not len(fields)==2 or fields[0][0]!=':':
1068
      sys.stderr.write('Invalid file format in [%s], line %d\n' % (filename,l))
1069
      continue
1070
    # put key:value in cache, key without ^:
1071
    cache[get_key(fields[0][1:])]=fields[1].split('\n')[0]
1072
  f.close()
1073
  return cache
1074
1075
def save_cache(filename,cache):
1076
  f=open(filename,'w+')
1077
  map(lambda x: f.write(':%s %s\n' % (str(x),str(cache.get(x)))),cache.keys())
1078
  f.close()
1079
1080
def get_git_sha1(name,type='heads'):
1081
  try:
1082
    # use git-rev-parse to support packed refs
1083 1567:3ad53f43483d Chris
    ref="refs/%s/%s" % (type,name)
1084
    l=subprocess.check_output(["git", "rev-parse", "--verify", "--quiet", ref])
1085 1544:e9e55585ebf2 chris
    if l == None or len(l) == 0:
1086
      return None
1087
    return l[0:40]
1088 1567:3ad53f43483d Chris
  except subprocess.CalledProcessError:
1089 1544:e9e55585ebf2 chris
    return None