To check out this repository please hg clone the following URL, or open the URL using EasyMercurial or your preferred Mercurial client.

Statistics Download as Zip
| Branch: | Tag: | Revision:

root / extra / fast-export @ 1544:e9e55585ebf2

1 1544:e9e55585ebf2 chris
svn-archive
2
svn-fast-export
3
*.pyc
4
.dotest
5
SVN ?= /usr/local/svn
6
APR_INCLUDES ?= /usr/include/apr-1.0
7
CFLAGS += -I${APR_INCLUDES} -I${SVN}/include/subversion-1 -pipe -O2 -std=c99
8
LDFLAGS += -L${SVN}/lib -lsvn_fs-1 -lsvn_repos-1
9
10
all: svn-fast-export svn-archive
11
12
svn-fast-export: svn-fast-export.c
13
svn-archive: svn-archive.c
14
15
.PHONY: clean
16
17
clean:
18
	rm -rf svn-fast-export svn-archive
19
hg-fast-export.(sh|py) - mercurial to git converter using git-fast-import
20
=========================================================================
21
22
Legal
23
-----
24
25
Most hg-* scripts are licensed under the [MIT license]
26
(http://www.opensource.org/licenses/mit-license.php) and were written
27
by Rocco Rutte <pdmef@gmx.net> with hints and help from the git list and
28
\#mercurial on freenode. hg-reset.py is licensed under GPLv2 since it
29
copies some code from the mercurial sources.
30
31
The current maintainer is Frej Drejhammar <frej.drejhammar@gmail.com>.
32
33
Usage
34
-----
35
36
Using hg-fast-export is quite simple for a mercurial repository <repo>:
37
38
```
39
mkdir repo-git # or whatever
40
cd repo-git
41
git init
42
hg-fast-export.sh -r <repo>
43
```
44
45
Please note that hg-fast-export does not automatically check out the
46
newly imported repository. You probably want to follow up the import
47
with a `git checkout`-command.
48
49
Incremental imports to track hg repos is supported, too.
50
51
Using hg-reset it is quite simple within a git repository that is
52
hg-fast-export'ed from mercurial:
53
54
```
55
hg-reset.sh -R <revision>
56
```
57
58
will give hints on which branches need adjustment for starting over
59
again.
60
61
When a mercurial repository does not use utf-8 for encoding author
62
strings and commit messages the `-e <encoding>` command line option
63
can be used to force fast-export to convert incoming meta data from
64
<encoding> to utf-8. This encoding option is also applied to file names.
65
66
In some locales Mercurial uses different encodings for commit messages
67
and file names. In that case, you can use `--fe <encoding>` command line
68
option which overrides the -e option for file names.
69
70
As mercurial appears to be much less picky about the syntax of the
71
author information than git, an author mapping file can be given to
72
hg-fast-export to fix up malformed author strings. The file is
73
specified using the -A option. The file should contain lines of the
74
form `FromAuthor=ToAuthor`. The example authors.map below will
75
translate `User <garbage<user@example.com>` to `User <user@example.com>`.
76
77
```
78
-- Start of authors.map --
79
User <garbage<user@example.com>=User <user@example.com>
80
-- End of authors.map --
81
```
82
83
Tag and Branch Naming
84
---------------------
85
86
As Git and Mercurial have differ in what is a valid branch and tag
87
name the -B and -T options allow a mapping file to be specified to
88
rename branches and tags (respectively). The syntax of the mapping
89
file is the same as for the author mapping.
90
91
Notes/Limitations
92
-----------------
93
94
hg-fast-export supports multiple branches but only named branches with
95
exactly one head each. Otherwise commits to the tip of these heads
96
within the branch will get flattened into merge commits.
97
98
As each git-fast-import run creates a new pack file, it may be
99
required to repack the repository quite often for incremental imports
100
(especially when importing a small number of changesets per
101
incremental import).
102
103
The way the hg API and remote access protocol is designed it is not
104
possible to use hg-fast-export on remote repositories
105
(http/ssh). First clone the repository, then convert it.
106
107
Design
108
------
109
110
hg-fast-export.py was designed in a way that doesn't require a 2-pass
111
mechanism or any prior repository analysis: if just feeds what it
112
finds into git-fast-import. This also implies that it heavily relies
113
on strictly linear ordering of changesets from hg, i.e. its
114
append-only storage model so that changesets hg-fast-export already
115
saw never get modified.
116
117
Submitting Patches
118
------------------
119
120
Please use the issue-tracker at github
121
https://github.com/frej/fast-export to report bugs and submit
122
patches.
123
#!/usr/bin/env python
124
125
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
126
# License: MIT <http://www.opensource.org/licenses/mit-license.php>
127
128
from mercurial import node
129
from hg2git import setup_repo,fixup_user,get_branch,get_changeset
130
from hg2git import load_cache,save_cache,get_git_sha1,set_default_branch,set_origin_name
131
from optparse import OptionParser
132
import re
133
import sys
134
import os
135
136
if sys.platform == "win32":
137
  # On Windows, sys.stdout is initially opened in text mode, which means that
138
  # when a LF (\n) character is written to sys.stdout, it will be converted
139
  # into CRLF (\r\n).  That makes git blow up, so use this platform-specific
140
  # code to change the mode of sys.stdout to binary.
141
  import msvcrt
142
  msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
143
144
# silly regex to catch Signed-off-by lines in log message
145
sob_re=re.compile('^Signed-[Oo]ff-[Bb]y: (.+)$')
146
# insert 'checkpoint' command after this many commits or none at all if 0
147
cfg_checkpoint_count=0
148
# write some progress message every this many file contents written
149
cfg_export_boundary=1000
150
151
def gitmode(flags):
152
  return 'l' in flags and '120000' or 'x' in flags and '100755' or '100644'
153
154
def wr_no_nl(msg=''):
155
  if msg:
156
    sys.stdout.write(msg)
157
158
def wr(msg=''):
159
  wr_no_nl(msg)
160
  sys.stdout.write('\n')
161
  #map(lambda x: sys.stderr.write('\t[%s]\n' % x),msg.split('\n'))
162
163
def checkpoint(count):
164
  count=count+1
165
  if cfg_checkpoint_count>0 and count%cfg_checkpoint_count==0:
166
    sys.stderr.write("Checkpoint after %d commits\n" % count)
167
    wr('checkpoint')
168
    wr()
169
  return count
170
171
def revnum_to_revref(rev, old_marks):
172
  """Convert an hg revnum to a git-fast-import rev reference (an SHA1
173
  or a mark)"""
174
  return old_marks.get(rev) or ':%d' % (rev+1)
175
176
def file_mismatch(f1,f2):
177
  """See if two revisions of a file are not equal."""
178
  return node.hex(f1)!=node.hex(f2)
179
180
def split_dict(dleft,dright,l=[],c=[],r=[],match=file_mismatch):
181
  """Loop over our repository and find all changed and missing files."""
182
  for left in dleft.keys():
183
    right=dright.get(left,None)
184
    if right==None:
185
      # we have the file but our parent hasn't: add to left set
186
      l.append(left)
187
    elif match(dleft[left],right) or gitmode(dleft.flags(left))!=gitmode(dright.flags(left)):
188
      # we have it but checksums mismatch: add to center set
189
      c.append(left)
190
  for right in dright.keys():
191
    left=dleft.get(right,None)
192
    if left==None:
193
      # if parent has file but we don't: add to right set
194
      r.append(right)
195
    # change is already handled when comparing child against parent
196
  return l,c,r
197
198
def get_filechanges(repo,revision,parents,mleft):
199
  """Given some repository and revision, find all changed/deleted files."""
200
  l,c,r=[],[],[]
201
  for p in parents:
202
    if p<0: continue
203
    mright=repo.changectx(p).manifest()
204
    l,c,r=split_dict(mleft,mright,l,c,r)
205
  l.sort()
206
  c.sort()
207
  r.sort()
208
  return l,c,r
209
210
def get_author(logmessage,committer,authors):
211
  """As git distincts between author and committer of a patch, try to
212
  extract author by detecting Signed-off-by lines.
213
214
  This walks from the end of the log message towards the top skipping
215
  empty lines. Upon the first non-empty line, it walks all Signed-off-by
216
  lines upwards to find the first one. For that (if found), it extracts
217
  authorship information the usual way (authors table, cleaning, etc.)
218
219
  If no Signed-off-by line is found, this defaults to the committer.
220
221
  This may sound stupid (and it somehow is), but in log messages we
222
  accidentially may have lines in the middle starting with
223
  "Signed-off-by: foo" and thus matching our detection regex. Prevent
224
  that."""
225
226
  loglines=logmessage.split('\n')
227
  i=len(loglines)
228
  # from tail walk to top skipping empty lines
229
  while i>=0:
230
    i-=1
231
    if len(loglines[i].strip())==0: continue
232
    break
233
  if i>=0:
234
    # walk further upwards to find first sob line, store in 'first'
235
    first=None
236
    while i>=0:
237
      m=sob_re.match(loglines[i])
238
      if m==None: break
239
      first=m
240
      i-=1
241
    # if the last non-empty line matches our Signed-Off-by regex: extract username
242
    if first!=None:
243
      r=fixup_user(first.group(1),authors)
244
      return r
245
  return committer
246
247
def export_file_contents(ctx,manifest,files,hgtags,encoding=''):
248
  count=0
249
  max=len(files)
250
  for file in files:
251
    # Skip .hgtags files. They only get us in trouble.
252
    if not hgtags and file == ".hgtags":
253
      sys.stderr.write('Skip %s\n' % (file))
254
      continue
255
    d=ctx.filectx(file).data()
256
    if encoding:
257
      filename=file.decode(encoding).encode('utf8')
258
    else:
259
      filename=file
260
    wr('M %s inline %s' % (gitmode(manifest.flags(file)),
261
                           strip_leading_slash(filename)))
262
    wr('data %d' % len(d)) # had some trouble with size()
263
    wr(d)
264
    count+=1
265
    if count%cfg_export_boundary==0:
266
      sys.stderr.write('Exported %d/%d files\n' % (count,max))
267
  if max>cfg_export_boundary:
268
    sys.stderr.write('Exported %d/%d files\n' % (count,max))
269
270
def sanitize_name(name,what="branch"):
271
  """Sanitize input roughly according to git-check-ref-format(1)"""
272
273
  def dot(name):
274
    if name[0] == '.': return '_'+name[1:]
275
    return name
276
277
  n=name
278
  p=re.compile('([[ ~^:?\\\\*]|\.\.)')
279
  n=p.sub('_', n)
280
  if n[-1] in ('/', '.'): n=n[:-1]+'_'
281
  n='/'.join(map(dot,n.split('/')))
282
  p=re.compile('_+')
283
  n=p.sub('_', n)
284
285
  if n!=name:
286
    sys.stderr.write('Warning: sanitized %s [%s] to [%s]\n' % (what,name,n))
287
  return n
288
289
def strip_leading_slash(filename):
290
  if filename[0] == '/':
291
    return filename[1:]
292
  return filename
293
294
def export_commit(ui,repo,revision,old_marks,max,count,authors,
295
                  branchesmap,sob,brmap,hgtags,notes,encoding='',fn_encoding=''):
296
  def get_branchname(name):
297
    if brmap.has_key(name):
298
      return brmap[name]
299
    n=sanitize_name(branchesmap.get(name,name))
300
    brmap[name]=n
301
    return n
302
303
  (revnode,_,user,(time,timezone),files,desc,branch,_)=get_changeset(ui,repo,revision,authors,encoding)
304
305
  branch=get_branchname(branch)
306
307
  parents = [p for p in repo.changelog.parentrevs(revision) if p >= 0]
308
309
  if len(parents)==0 and revision != 0:
310
    wr('reset refs/heads/%s' % branch)
311
312
  wr('commit refs/heads/%s' % branch)
313
  wr('mark :%d' % (revision+1))
314
  if sob:
315
    wr('author %s %d %s' % (get_author(desc,user,authors),time,timezone))
316
  wr('committer %s %d %s' % (user,time,timezone))
317
  wr('data %d' % (len(desc)+1)) # wtf?
318
  wr(desc)
319
  wr()
320
321
  ctx=repo.changectx(str(revision))
322
  man=ctx.manifest()
323
  added,changed,removed,type=[],[],[],''
324
325
  if len(parents) == 0:
326
    # first revision: feed in full manifest
327
    added=man.keys()
328
    added.sort()
329
    type='full'
330
  else:
331
    wr('from %s' % revnum_to_revref(parents[0], old_marks))
332
    if len(parents) == 1:
333
      # later non-merge revision: feed in changed manifest
334
      # if we have exactly one parent, just take the changes from the
335
      # manifest without expensively comparing checksums
336
      f=repo.status(repo.lookup(parents[0]),revnode)[:3]
337
      added,changed,removed=f[1],f[0],f[2]
338
      type='simple delta'
339
    else: # a merge with two parents
340
      wr('merge %s' % revnum_to_revref(parents[1], old_marks))
341
      # later merge revision: feed in changed manifest
342
      # for many files comparing checksums is expensive so only do it for
343
      # merges where we really need it due to hg's revlog logic
344
      added,changed,removed=get_filechanges(repo,revision,parents,man)
345
      type='thorough delta'
346
347
  sys.stderr.write('%s: Exporting %s revision %d/%d with %d/%d/%d added/changed/removed files\n' %
348
      (branch,type,revision+1,max,len(added),len(changed),len(removed)))
349
350
  if fn_encoding:
351
    removed=[r.decode(fn_encoding).encode('utf8') for r in removed]
352
353
  removed=[strip_leading_slash(x) for x in removed]
354
355
  map(lambda r: wr('D %s' % r),removed)
356
  export_file_contents(ctx,man,added,hgtags,fn_encoding)
357
  export_file_contents(ctx,man,changed,hgtags,fn_encoding)
358
  wr()
359
360
  count=checkpoint(count)
361
  count=generate_note(user,time,timezone,revision,ctx,count,notes)
362
  return count
363
364
def generate_note(user,time,timezone,revision,ctx,count,notes):
365
  if not notes:
366
    return count
367
  wr('commit refs/notes/hg')
368
  wr('committer %s %d %s' % (user,time,timezone))
369
  wr('data 0')
370
  wr('N inline :%d' % (revision+1))
371
  hg_hash=ctx.hex()
372
  wr('data %d' % (len(hg_hash)))
373
  wr_no_nl(hg_hash)
374
  wr()
375
  return checkpoint(count)
376
377
def export_tags(ui,repo,old_marks,mapping_cache,count,authors,tagsmap):
378
  l=repo.tagslist()
379
  for tag,node in l:
380
    # Remap the branch name
381
    tag=sanitize_name(tagsmap.get(tag,tag),"tag")
382
    # ignore latest revision
383
    if tag=='tip': continue
384
    # ignore tags to nodes that are missing (ie, 'in the future')
385
    if node.encode('hex_codec') not in mapping_cache:
386
      sys.stderr.write('Tag %s refers to unseen node %s\n' % (tag, node.encode('hex_codec')))
387
      continue
388
389
    rev=int(mapping_cache[node.encode('hex_codec')])
390
391
    ref=revnum_to_revref(rev, old_marks)
392
    if ref==None:
393
      sys.stderr.write('Failed to find reference for creating tag'
394
          ' %s at r%d\n' % (tag,rev))
395
      continue
396
    sys.stderr.write('Exporting tag [%s] at [hg r%d] [git %s]\n' % (tag,rev,ref))
397
    wr('reset refs/tags/%s' % tag)
398
    wr('from %s' % ref)
399
    wr()
400
    count=checkpoint(count)
401
  return count
402
403
def load_mapping(name, filename):
404
  cache={}
405
  if not os.path.exists(filename):
406
    return cache
407
  f=open(filename,'r')
408
  l=0
409
  a=0
410
  lre=re.compile('^([^=]+)[ ]*=[ ]*(.+)$')
411
  for line in f.readlines():
412
    l+=1
413
    line=line.strip()
414
    if line=='' or line[0]=='#':
415
      continue
416
    m=lre.match(line)
417
    if m==None:
418
      sys.stderr.write('Invalid file format in [%s], line %d\n' % (filename,l))
419
      continue
420
    # put key:value in cache, key without ^:
421
    cache[m.group(1).strip()]=m.group(2).strip()
422
    a+=1
423
  f.close()
424
  sys.stderr.write('Loaded %d %s\n' % (a, name))
425
  return cache
426
427
def branchtip(repo, heads):
428
  '''return the tipmost branch head in heads'''
429
  tip = heads[-1]
430
  for h in reversed(heads):
431
    if 'close' not in repo.changelog.read(h)[5]:
432
      tip = h
433
      break
434
  return tip
435
436
def verify_heads(ui,repo,cache,force):
437
  branches={}
438
  for bn, heads in repo.branchmap().iteritems():
439
    branches[bn] = branchtip(repo, heads)
440
  l=[(-repo.changelog.rev(n), n, t) for t, n in branches.items()]
441
  l.sort()
442
443
  # get list of hg's branches to verify, don't take all git has
444
  for _,_,b in l:
445
    b=get_branch(b)
446
    sha1=get_git_sha1(b)
447
    c=cache.get(b)
448
    if sha1!=c:
449
      sys.stderr.write('Error: Branch [%s] modified outside hg-fast-export:'
450
        '\n%s (repo) != %s (cache)\n' % (b,sha1,c))
451
      if not force: return False
452
453
  # verify that branch has exactly one head
454
  t={}
455
  for h in repo.heads():
456
    (_,_,_,_,_,_,branch,_)=get_changeset(ui,repo,h)
457
    if t.get(branch,False):
458
      sys.stderr.write('Error: repository has at least one unnamed head: hg r%s\n' %
459
          repo.changelog.rev(h))
460
      if not force: return False
461
    t[branch]=True
462
463
  return True
464
465
def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
466
           authors={},branchesmap={},tagsmap={},
467
           sob=False,force=False,hgtags=False,notes=False,encoding='',fn_encoding=''):
468
  _max=int(m)
469
470
  old_marks=load_cache(marksfile,lambda s: int(s)-1)
471
  mapping_cache=load_cache(mappingfile)
472
  heads_cache=load_cache(headsfile)
473
  state_cache=load_cache(tipfile)
474
475
  ui,repo=setup_repo(repourl)
476
477
  if not verify_heads(ui,repo,heads_cache,force):
478
    return 1
479
480
  try:
481
    tip=repo.changelog.count()
482
  except AttributeError:
483
    tip=len(repo)
484
485
  min=int(state_cache.get('tip',0))
486
  max=_max
487
  if _max<0 or max>tip:
488
    max=tip
489
490
  for rev in range(0,max):
491
  	(revnode,_,_,_,_,_,_,_)=get_changeset(ui,repo,rev,authors)
492
  	mapping_cache[revnode.encode('hex_codec')] = str(rev)
493
494
495
  c=0
496
  brmap={}
497
  for rev in range(min,max):
498
    c=export_commit(ui,repo,rev,old_marks,max,c,authors,branchesmap,
499
                    sob,brmap,hgtags,notes,encoding,fn_encoding)
500
501
  state_cache['tip']=max
502
  state_cache['repo']=repourl
503
  save_cache(tipfile,state_cache)
504
  save_cache(mappingfile,mapping_cache)
505
506
  c=export_tags(ui,repo,old_marks,mapping_cache,c,authors,tagsmap)
507
508
  sys.stderr.write('Issued %d commands\n' % c)
509
510
  return 0
511
512
if __name__=='__main__':
513
  def bail(parser,opt):
514
    sys.stderr.write('Error: No %s option given\n' % opt)
515
    parser.print_help()
516
    sys.exit(2)
517
518
  parser=OptionParser()
519
520
  parser.add_option("-m","--max",type="int",dest="max",
521
      help="Maximum hg revision to import")
522
  parser.add_option("--mapping",dest="mappingfile",
523
      help="File to read last run's hg-to-git SHA1 mapping")
524
  parser.add_option("--marks",dest="marksfile",
525
      help="File to read git-fast-import's marks from")
526
  parser.add_option("--heads",dest="headsfile",
527
      help="File to read last run's git heads from")
528
  parser.add_option("--status",dest="statusfile",
529
      help="File to read status from")
530
  parser.add_option("-r","--repo",dest="repourl",
531
      help="URL of repo to import")
532
  parser.add_option("-s",action="store_true",dest="sob",
533
      default=False,help="Enable parsing Signed-off-by lines")
534
  parser.add_option("--hgtags",action="store_true",dest="hgtags",
535
      default=False,help="Enable exporting .hgtags files")
536
  parser.add_option("-A","--authors",dest="authorfile",
537
      help="Read authormap from AUTHORFILE")
538
  parser.add_option("-B","--branches",dest="branchesfile",
539
      help="Read branch map from BRANCHESFILE")
540
  parser.add_option("-T","--tags",dest="tagsfile",
541
      help="Read tags map from TAGSFILE")
542
  parser.add_option("-f","--force",action="store_true",dest="force",
543
      default=False,help="Ignore validation errors by force")
544
  parser.add_option("-M","--default-branch",dest="default_branch",
545
      help="Set the default branch")
546
  parser.add_option("-o","--origin",dest="origin_name",
547
      help="use <name> as namespace to track upstream")
548
  parser.add_option("--hg-hash",action="store_true",dest="notes",
549
      default=False,help="Annotate commits with the hg hash as git notes in the hg namespace")
550
  parser.add_option("-e",dest="encoding",
551
      help="Assume commit and author strings retrieved from Mercurial are encoded in <encoding>")
552
  parser.add_option("--fe",dest="fn_encoding",
553
      help="Assume file names from Mercurial are encoded in <filename_encoding>")
554
555
  (options,args)=parser.parse_args()
556
557
  m=-1
558
  if options.max!=None: m=options.max
559
560
  if options.marksfile==None: bail(parser,'--marks')
561
  if options.mappingfile==None: bail(parser,'--mapping')
562
  if options.headsfile==None: bail(parser,'--heads')
563
  if options.statusfile==None: bail(parser,'--status')
564
  if options.repourl==None: bail(parser,'--repo')
565
566
  a={}
567
  if options.authorfile!=None:
568
    a=load_mapping('authors', options.authorfile)
569
570
  b={}
571
  if options.branchesfile!=None:
572
    b=load_mapping('branches', options.branchesfile)
573
574
  t={}
575
  if options.tagsfile!=None:
576
    t=load_mapping('tags', options.tagsfile)
577
578
  if options.default_branch!=None:
579
    set_default_branch(options.default_branch)
580
581
  if options.origin_name!=None:
582
    set_origin_name(options.origin_name)
583
584
  encoding=''
585
  if options.encoding!=None:
586
    encoding=options.encoding
587
588
  fn_encoding=encoding
589
  if options.fn_encoding!=None:
590
    fn_encoding=options.fn_encoding
591
592
  sys.exit(hg2git(options.repourl,m,options.marksfile,options.mappingfile,
593
                  options.headsfile, options.statusfile,
594
                  authors=a,branchesmap=b,tagsmap=t,
595
                  sob=options.sob,force=options.force,hgtags=options.hgtags,
596
                  notes=options.notes,encoding=encoding,fn_encoding=fn_encoding))
597
#!/bin/sh
598
599
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
600
# License: MIT <http://www.opensource.org/licenses/mit-license.php>
601
602
ROOT="$(dirname "$(which "$0")")"
603
REPO=""
604
PFX="hg2git"
605
SFX_MAPPING="mapping"
606
SFX_MARKS="marks"
607
SFX_HEADS="heads"
608
SFX_STATE="state"
609
GFI_OPTS=""
610
PYTHON=${PYTHON:-python}
611
612
USAGE="[--quiet] [-r <repo>] [--force] [-m <max>] [-s] [--hgtags] [-A <file>] [-B <file>] [-T <file>] [-M <name>] [-o <name>] [--hg-hash] [-e <encoding>]"
613
LONG_USAGE="Import hg repository <repo> up to either tip or <max>
614
If <repo> is omitted, use last hg repository as obtained from state file,
615
GIT_DIR/$PFX-$SFX_STATE by default.
616
617
Note: The argument order matters.
618
619
Options:
620
	--quiet   Passed to git-fast-import(1)
621
	-r <repo> Mercurial repository to import
622
	--force   Ignore validation errors when converting, and pass --force
623
	          to git-fast-import(1)
624
	-m <max>  Maximum revision to import
625
	-s        Enable parsing Signed-off-by lines
626
	--hgtags  Enable exporting .hgtags files
627
	-A <file> Read author map from file
628
	          (Same as in git-svnimport(1) and git-cvsimport(1))
629
	-B <file> Read branch map from file
630
	-T <file> Read tags map from file
631
	-M <name> Set the default branch name (defaults to 'master')
632
	-o <name> Use <name> as branch namespace to track upstream (eg 'origin')
633
	--hg-hash Annotate commits with the hg hash as git notes in the
634
                  hg namespace.
635
	-e <encoding> Assume commit and author strings retrieved from
636
	              Mercurial are encoded in <encoding>
637
	--fe <filename_encoding> Assume filenames from Mercurial are encoded
638
	                         in <filename_encoding>
639
"
640
case "$1" in
641
    -h|--help)
642
      echo "usage: $(basename "$0") $USAGE"
643
      echo ""
644
      echo "$LONG_USAGE"
645
      exit 0
646
esac
647
. "$(git --exec-path)/git-sh-setup"
648
cd_to_toplevel
649
650
while case "$#" in 0) break ;; esac
651
do
652
  case "$1" in
653
    -r|--r|--re|--rep|--repo)
654
      shift
655
      REPO="$1"
656
      ;;
657
    --q|--qu|--qui|--quie|--quiet)
658
      GFI_OPTS="$GFI_OPTS --quiet"
659
      ;;
660
    --force)
661
      # pass --force to git-fast-import and hg-fast-export.py
662
      GFI_OPTS="$GFI_OPTS --force"
663
      break
664
      ;;
665
    -*)
666
      # pass any other options down to hg2git.py
667
      break
668
      ;;
669
    *)
670
      break
671
      ;;
672
  esac
673
  shift
674
done
675
676
# for convenience: get default repo from state file
677
if [ x"$REPO" = x -a -f "$GIT_DIR/$PFX-$SFX_STATE" ] ; then
678
  REPO="`grep '^:repo ' "$GIT_DIR/$PFX-$SFX_STATE" | cut -d ' ' -f 2`"
679
  echo "Using last hg repository \"$REPO\""
680
fi
681
682
if [  -z "$REPO" ]; then
683
    echo "no repo given, use -r flag"
684
    exit 1
685
fi
686
687
# make sure we have a marks cache
688
if [ ! -f "$GIT_DIR/$PFX-$SFX_MARKS" ] ; then
689
  touch "$GIT_DIR/$PFX-$SFX_MARKS"
690
fi
691
692
# cleanup on exit
693
trap 'rm -f "$GIT_DIR/$PFX-$SFX_MARKS.old" "$GIT_DIR/$PFX-$SFX_MARKS.tmp"' 0
694
695
_err1=
696
_err2=
697
exec 3>&1
698
{ read -r _err1 || :; read -r _err2 || :; } <<-EOT
699
$(
700
  exec 4>&3 3>&1 1>&4 4>&-
701
  {
702
    _e1=0
703
    GIT_DIR="$GIT_DIR" $PYTHON "$ROOT/hg-fast-export.py" \
704
      --repo "$REPO" \
705
      --marks "$GIT_DIR/$PFX-$SFX_MARKS" \
706
      --mapping "$GIT_DIR/$PFX-$SFX_MAPPING" \
707
      --heads "$GIT_DIR/$PFX-$SFX_HEADS" \
708
      --status "$GIT_DIR/$PFX-$SFX_STATE" \
709
      "$@" 3>&- || _e1=$?
710
    echo $_e1 >&3
711
  } | \
712
  {
713
    _e2=0
714
    git fast-import $GFI_OPTS --export-marks="$GIT_DIR/$PFX-$SFX_MARKS.tmp" 3>&- || _e2=$?
715
    echo $_e2 >&3
716
  }
717
)
718
EOT
719
exec 3>&-
720
[ "$_err1" = 0 -a "$_err2" = 0 ] || exit 1
721
722
# move recent marks cache out of the way...
723
if [ -f "$GIT_DIR/$PFX-$SFX_MARKS" ] ; then
724
  mv "$GIT_DIR/$PFX-$SFX_MARKS" "$GIT_DIR/$PFX-$SFX_MARKS.old"
725
else
726
  touch "$GIT_DIR/$PFX-$SFX_MARKS.old"
727
fi
728
729
# ...to create a new merged one
730
cat "$GIT_DIR/$PFX-$SFX_MARKS.old" "$GIT_DIR/$PFX-$SFX_MARKS.tmp" \
731
| uniq > "$GIT_DIR/$PFX-$SFX_MARKS"
732
733
# save SHA1s of current heads for incremental imports
734
# and connectivity (plus sanity checking)
735
for head in `git branch | sed 's#^..##'` ; do
736
  id="`git rev-parse refs/heads/$head`"
737
  echo ":$head $id"
738
done > "$GIT_DIR/$PFX-$SFX_HEADS"
739
740
# check diff with color:
741
# ( for i in `find . -type f | grep -v '\.git'` ; do diff -u $i $REPO/$i ; done | cdiff ) | less -r
742
#!/usr/bin/env python
743
744
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
745
# License: GPLv2
746
747
from mercurial import node
748
from hg2git import setup_repo,load_cache,get_changeset,get_git_sha1
749
from optparse import OptionParser
750
import sys
751
752
def heads(ui,repo,start=None,stop=None,max=None):
753
  # this is copied from mercurial/revlog.py and differs only in
754
  # accepting a max argument for xrange(startrev+1,...) defaulting
755
  # to the original repo.changelog.count()
756
  if start is None:
757
    start = node.nullid
758
  if stop is None:
759
    stop = []
760
  if max is None:
761
    max = repo.changelog.count()
762
  stoprevs = dict.fromkeys([repo.changelog.rev(n) for n in stop])
763
  startrev = repo.changelog.rev(start)
764
  reachable = {startrev: 1}
765
  heads = {startrev: 1}
766
767
  parentrevs = repo.changelog.parentrevs
768
  for r in xrange(startrev + 1, max):
769
    for p in parentrevs(r):
770
      if p in reachable:
771
        if r not in stoprevs:
772
          reachable[r] = 1
773
        heads[r] = 1
774
      if p in heads and p not in stoprevs:
775
        del heads[p]
776
777
  return [(repo.changelog.node(r),str(r)) for r in heads]
778
779
def get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,max):
780
  h=heads(ui,repo,max=max)
781
  stale=dict.fromkeys(heads_cache)
782
  changed=[]
783
  unchanged=[]
784
  for node,rev in h:
785
    _,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev)
786
    del stale[branch]
787
    git_sha1=get_git_sha1(branch)
788
    cache_sha1=marks_cache.get(str(int(rev)+1))
789
    if git_sha1!=None and git_sha1==cache_sha1:
790
      unchanged.append([branch,cache_sha1,rev,desc.split('\n')[0],user])
791
    else:
792
      changed.append([branch,cache_sha1,rev,desc.split('\n')[0],user])
793
  changed.sort()
794
  unchanged.sort()
795
  return stale,changed,unchanged
796
797
def get_tags(ui,repo,marks_cache,mapping_cache,max):
798
  l=repo.tagslist()
799
  good,bad=[],[]
800
  for tag,node in l:
801
    if tag=='tip': continue
802
    rev=int(mapping_cache[node.encode('hex_codec')])
803
    cache_sha1=marks_cache.get(str(int(rev)+1))
804
    _,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev)
805
    if int(rev)>int(max):
806
      bad.append([tag,branch,cache_sha1,rev,desc.split('\n')[0],user])
807
    else:
808
      good.append([tag,branch,cache_sha1,rev,desc.split('\n')[0],user])
809
  good.sort()
810
  bad.sort()
811
  return good,bad
812
813
def mangle_mark(mark):
814
  return str(int(mark)-1)
815
816
if __name__=='__main__':
817
  def bail(parser,opt):
818
    sys.stderr.write('Error: No option %s given\n' % opt)
819
    parser.print_help()
820
    sys.exit(2)
821
822
  parser=OptionParser()
823
824
  parser.add_option("--marks",dest="marksfile",
825
      help="File to read git-fast-import's marks from")
826
  parser.add_option("--mapping",dest="mappingfile",
827
      help="File to read last run's hg-to-git SHA1 mapping")
828
  parser.add_option("--heads",dest="headsfile",
829
      help="File to read last run's git heads from")
830
  parser.add_option("--status",dest="statusfile",
831
      help="File to read status from")
832
  parser.add_option("-r","--repo",dest="repourl",
833
      help="URL of repo to import")
834
  parser.add_option("-R","--revision",type=int,dest="revision",
835
      help="Revision to reset to")
836
837
  (options,args)=parser.parse_args()
838
839
  if options.marksfile==None: bail(parser,'--marks option')
840
  if options.mappingfile==None: bail(parser,'--mapping option')
841
  if options.headsfile==None: bail(parser,'--heads option')
842
  if options.statusfile==None: bail(parser,'--status option')
843
  if options.repourl==None: bail(parser,'--repo option')
844
  if options.revision==None: bail(parser,'-R/--revision')
845
846
  heads_cache=load_cache(options.headsfile)
847
  marks_cache=load_cache(options.marksfile,mangle_mark)
848
  state_cache=load_cache(options.statusfile)
849
  mapping_cache = load_cache(options.mappingfile)
850
851
  l=int(state_cache.get('tip',options.revision))
852
  if options.revision+1>l:
853
    sys.stderr.write('Revision is beyond last revision imported: %d>%d\n' % (options.revision,l))
854
    sys.exit(1)
855
856
  ui,repo=setup_repo(options.repourl)
857
858
  stale,changed,unchanged=get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,options.revision+1)
859
  good,bad=get_tags(ui,repo,marks_cache,mapping_cache,options.revision+1)
860
861
  print "Possibly stale branches:"
862
  map(lambda b: sys.stdout.write('\t%s\n' % b),stale.keys())
863
864
  print "Possibly stale tags:"
865
  map(lambda b: sys.stdout.write('\t%s on %s (r%s)\n' % (b[0],b[1],b[3])),bad)
866
867
  print "Unchanged branches:"
868
  map(lambda b: sys.stdout.write('\t%s (r%s)\n' % (b[0],b[2])),unchanged)
869
870
  print "Unchanged tags:"
871
  map(lambda b: sys.stdout.write('\t%s on %s (r%s)\n' % (b[0],b[1],b[3])),good)
872
873
  print "Reset branches in '%s' to:" % options.headsfile
874
  map(lambda b: sys.stdout.write('\t:%s %s\n\t\t(r%s: %s: %s)\n' % (b[0],b[1],b[2],b[4],b[3])),changed)
875
876
  print "Reset ':tip' in '%s' to '%d'" % (options.statusfile,options.revision)
877
#!/bin/sh
878
879
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
880
# License: MIT <http://www.opensource.org/licenses/mit-license.php>
881
882
ROOT="`dirname $0`"
883
REPO=""
884
PFX="hg2git"
885
SFX_MARKS="marks"
886
SFX_MAPPING="mapping"
887
SFX_HEADS="heads"
888
SFX_STATE="state"
889
QUIET=""
890
PYTHON=${PYTHON:-python}
891
892
USAGE="[-r <repo>] -R <rev>"
893
LONG_USAGE="Print SHA1s of latest changes per branch up to <rev> useful
894
to reset import and restart at <rev>.
895
If <repo> is omitted, use last hg repository as obtained from state file,
896
GIT_DIR/$PFX-$SFX_STATE by default.
897
898
Options:
899
	-R	Hg revision to reset to
900
	-r	Mercurial repository to use
901
"
902
903
. "$(git --exec-path)/git-sh-setup"
904
cd_to_toplevel
905
906
while case "$#" in 0) break ;; esac
907
do
908
  case "$1" in
909
    -r|--r|--re|--rep|--repo)
910
      shift
911
      REPO="$1"
912
      ;;
913
    -*)
914
      # pass any other options down to hg2git.py
915
      break
916
      ;;
917
    *)
918
      break
919
      ;;
920
  esac
921
  shift
922
done
923
924
# for convenience: get default repo from state file
925
if [ x"$REPO" = x -a -f "$GIT_DIR/$PFX-$SFX_STATE" ] ; then
926
  REPO="`grep '^:repo ' "$GIT_DIR/$PFX-$SFX_STATE" | cut -d ' ' -f 2`"
927
  echo "Using last hg repository \"$REPO\""
928
fi
929
930
# make sure we have a marks cache
931
if [ ! -f "$GIT_DIR/$PFX-$SFX_MARKS" ] ; then
932
  touch "$GIT_DIR/$PFX-$SFX_MARKS"
933
fi
934
935
GIT_DIR="$GIT_DIR" $PYTHON "$ROOT/hg-reset.py" \
936
  --repo "$REPO" \
937
  --marks "$GIT_DIR/$PFX-$SFX_MARKS" \
938
  --mapping "$GIT_DIR/$PFX-$SFX_MAPPING" \
939
  --heads "$GIT_DIR/$PFX-$SFX_HEADS" \
940
  --status "$GIT_DIR/$PFX-$SFX_STATE" \
941
  "$@"
942
943
exit $?
944
#!/usr/bin/env python
945
946
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
947
# License: MIT <http://www.opensource.org/licenses/mit-license.php>
948
949
from mercurial import hg,util,ui,templatefilters
950
import re
951
import os
952
import sys
953
954
# default git branch name
955
cfg_master='master'
956
# default origin name
957
origin_name=''
958
# silly regex to see if user field has email address
959
user_re=re.compile('([^<]+) (<[^>]*>)$')
960
# silly regex to clean out user names
961
user_clean_re=re.compile('^["]([^"]+)["]$')
962
963
def set_default_branch(name):
964
  global cfg_master
965
  cfg_master = name
966
967
def set_origin_name(name):
968
  global origin_name
969
  origin_name = name
970
971
def setup_repo(url):
972
  try:
973
    myui=ui.ui(interactive=False)
974
  except TypeError:
975
    myui=ui.ui()
976
    myui.setconfig('ui', 'interactive', 'off')
977
  return myui,hg.repository(myui,url)
978
979
def fixup_user(user,authors):
980
  user=user.strip("\"")
981
  if authors!=None:
982
    # if we have an authors table, try to get mapping
983
    # by defaulting to the current value of 'user'
984
    user=authors.get(user,user)
985
  name,mail,m='','',user_re.match(user)
986
  if m==None:
987
    # if we don't have 'Name <mail>' syntax, extract name
988
    # and mail from hg helpers. this seems to work pretty well.
989
    # if email doesn't contain @, replace it with devnull@localhost
990
    name=templatefilters.person(user)
991
    mail='<%s>' % util.email(user)
992
    if '@' not in mail:
993
      mail = '<devnull@localhost>'
994
  else:
995
    # if we have 'Name <mail>' syntax, everything is fine :)
996
    name,mail=m.group(1),m.group(2)
997
998
  # remove any silly quoting from username
999
  m2=user_clean_re.match(name)
1000
  if m2!=None:
1001
    name=m2.group(1)
1002
  return '%s %s' % (name,mail)
1003
1004
def get_branch(name):
1005
  # 'HEAD' is the result of a bug in mutt's cvs->hg conversion,
1006
  # other CVS imports may need it, too
1007
  if name=='HEAD' or name=='default' or name=='':
1008
    name=cfg_master
1009
  if origin_name:
1010
    return origin_name + '/' + name
1011
  return name
1012
1013
def get_changeset(ui,repo,revision,authors={},encoding=''):
1014
  node=repo.lookup(revision)
1015
  (manifest,user,(time,timezone),files,desc,extra)=repo.changelog.read(node)
1016
  if encoding:
1017
    user=user.decode(encoding).encode('utf8')
1018
    desc=desc.decode(encoding).encode('utf8')
1019
  tz="%+03d%02d" % (-timezone / 3600, ((-timezone % 3600) / 60))
1020
  branch=get_branch(extra.get('branch','master'))
1021
  return (node,manifest,fixup_user(user,authors),(time,tz),files,desc,branch,extra)
1022
1023
def mangle_key(key):
1024
  return key
1025
1026
def load_cache(filename,get_key=mangle_key):
1027
  cache={}
1028
  if not os.path.exists(filename):
1029
    return cache
1030
  f=open(filename,'r')
1031
  l=0
1032
  for line in f.readlines():
1033
    l+=1
1034
    fields=line.split(' ')
1035
    if fields==None or not len(fields)==2 or fields[0][0]!=':':
1036
      sys.stderr.write('Invalid file format in [%s], line %d\n' % (filename,l))
1037
      continue
1038
    # put key:value in cache, key without ^:
1039
    cache[get_key(fields[0][1:])]=fields[1].split('\n')[0]
1040
  f.close()
1041
  return cache
1042
1043
def save_cache(filename,cache):
1044
  f=open(filename,'w+')
1045
  map(lambda x: f.write(':%s %s\n' % (str(x),str(cache.get(x)))),cache.keys())
1046
  f.close()
1047
1048
def get_git_sha1(name,type='heads'):
1049
  try:
1050
    # use git-rev-parse to support packed refs
1051
    cmd="git rev-parse --verify refs/%s/%s 2>%s" % (type,name,os.devnull)
1052
    p=os.popen(cmd)
1053
    l=p.readline()
1054
    p.close()
1055
    if l == None or len(l) == 0:
1056
      return None
1057
    return l[0:40]
1058
  except IOError:
1059
    return None
1060
/*
1061
 * svn-archive.c
1062
 * ----------
1063
 *  Walk through a given revision of a local Subversion repository and export
1064
 *  all of the contents as a tarfile.
1065
 *
1066
 * Author: Chris Lee <clee@kde.org>
1067
 * License: MIT <http://www.opensource.org/licenses/mit-license.php>
1068
 */
1069
1070
#define _XOPEN_SOURCE
1071
#include <unistd.h>
1072
#include <string.h>
1073
#include <stdio.h>
1074
#include <time.h>
1075
1076
#ifndef PATH_MAX
1077
#define PATH_MAX 4096
1078
#endif
1079
1080
#include <apr_general.h>
1081
#include <apr_strings.h>
1082
#include <apr_getopt.h>
1083
#include <apr_lib.h>
1084
1085
#include <svn_types.h>
1086
#include <svn_pools.h>
1087
#include <svn_repos.h>
1088
#include <svn_fs.h>
1089
1090
#undef SVN_ERR
1091
#define SVN_ERR(expr) SVN_INT_ERR(expr)
1092
#define apr_sane_push(arr, contents) *(char **)apr_array_push(arr) = contents
1093
1094
#define TRUNK "/trunk"
1095
1096
static time_t archive_time;
1097
1098
time_t get_epoch(char *svn_date)
1099
{
1100
    struct tm tm = {0};
1101
    char *date = malloc(strlen(svn_date) * sizeof(char *));
1102
    strncpy(date, svn_date, strlen(svn_date) - 8);
1103
    strptime(date, "%Y-%m-%dT%H:%M:%S", &tm);
1104
    free(date);
1105
    return mktime(&tm);
1106
}
1107
1108
int tar_header(apr_pool_t *pool, char *path, char *node, size_t f_size)
1109
{
1110
    char          buf[512];
1111
    unsigned int  i, checksum;
1112
    svn_boolean_t is_dir;
1113
1114
    memset(buf, 0, sizeof(buf));
1115
1116
    if ((strlen(path) == 0) && (strlen(node) == 0)) {
1117
        return 0;
1118
    }
1119
1120
    if (strlen(node) == 0) {
1121
        is_dir = 1;
1122
    } else {
1123
        is_dir = 0;
1124
    }
1125
1126
    if (strlen(path) == 0) {
1127
        strncpy(buf, apr_psprintf(pool, "%s", node), 99);
1128
    } else if (strlen(path) + strlen(node) < 100) {
1129
        strncpy(buf, apr_psprintf(pool, "%s/%s", path+1, node), 99);
1130
    } else {
1131
        fprintf(stderr, "really long file path...\n");
1132
        strncpy(&buf[0], node, 99);
1133
        strncpy(&buf[345], path+1, 154);
1134
    }
1135
1136
    strncpy(&buf[100], apr_psprintf(pool, "%07o", (is_dir ? 0755 : 0644)), 7);
1137
    strncpy(&buf[108], apr_psprintf(pool, "%07o", 1000), 7);
1138
    strncpy(&buf[116], apr_psprintf(pool, "%07o", 1000), 7);
1139
    strncpy(&buf[124], apr_psprintf(pool, "%011lo", f_size), 11);
1140
    strncpy(&buf[136], apr_psprintf(pool, "%011lo", archive_time), 11);
1141
    strncpy(&buf[156], (is_dir ? "5" : "0"), 1);
1142
    strncpy(&buf[257], "ustar  ", 8);
1143
    strncpy(&buf[265], "clee", 31);
1144
    strncpy(&buf[297], "clee", 31);
1145
    // strncpy(&buf[329], apr_psprintf(pool, "%07o", 0), 7);
1146
    // strncpy(&buf[337], apr_psprintf(pool, "%07o", 0), 7);
1147
1148
    strncpy(&buf[148], "        ", 8);
1149
    checksum = 0;
1150
    for (i = 0; i < sizeof(buf); i++) {
1151
        checksum += buf[i];
1152
    }
1153
    strncpy(&buf[148], apr_psprintf(pool, "%07o", checksum & 0x1fffff), 7);
1154
1155
    fwrite(buf, sizeof(char), sizeof(buf), stdout);
1156
1157
    return 0;
1158
}
1159
1160
int tar_footer()
1161
{
1162
    char block[1024];
1163
    memset(block, 0, sizeof(block));
1164
    fwrite(block, sizeof(char), sizeof(block), stdout);
1165
}
1166
1167
int dump_blob(svn_fs_root_t *root, char *prefix, char *path, char *node, apr_pool_t *pool)
1168
{
1169
    char           *full_path, buf[512];
1170
    apr_size_t     len;
1171
    svn_stream_t   *stream;
1172
    svn_filesize_t stream_length;
1173
1174
    full_path = apr_psprintf(pool, "%s%s/%s", prefix, path, node);
1175
1176
    SVN_ERR(svn_fs_file_length(&stream_length, root, full_path, pool));
1177
    SVN_ERR(svn_fs_file_contents(&stream, root, full_path, pool));
1178
1179
    tar_header(pool, path, node, stream_length);
1180
1181
    do {
1182
        len = sizeof(buf);
1183
        memset(buf, '\0', sizeof(buf));
1184
        SVN_ERR(svn_stream_read(stream, buf, &len));
1185
        fwrite(buf, sizeof(char), sizeof(buf), stdout);
1186
    } while (len == sizeof(buf));
1187
1188
    return 0;
1189
}
1190
1191
int dump_tree(svn_fs_root_t *root, char *prefix, char *path, apr_pool_t *pool)
1192
{
1193
    const void       *key;
1194
    void             *val;
1195
    char             *node, *subpath, *full_path;
1196
1197
    apr_pool_t       *subpool;
1198
    apr_hash_t       *dir_entries;
1199
    apr_hash_index_t *i;
1200
1201
    svn_boolean_t    is_dir;
1202
1203
    tar_header(pool, path, "", 0);
1204
1205
    SVN_ERR(svn_fs_dir_entries(&dir_entries, root, apr_psprintf(pool, "%s/%s", prefix, path), pool));
1206
1207
    subpool = svn_pool_create(pool);
1208
1209
    for (i = apr_hash_first(pool, dir_entries); i; i = apr_hash_next(i)) {
1210
        svn_pool_clear(subpool);
1211
        apr_hash_this(i, &key, NULL, &val);
1212
        node = (char *)key;
1213
1214
        subpath = apr_psprintf(subpool, "%s/%s", path, node);
1215
        full_path = apr_psprintf(subpool, "%s%s", prefix, subpath);
1216
1217
        svn_fs_is_dir(&is_dir, root, full_path, subpool);
1218
1219
        if (is_dir) {
1220
            dump_tree(root, prefix, subpath, subpool);
1221
        } else {
1222
            dump_blob(root, prefix, path, node, subpool);
1223
        }
1224
    }
1225
1226
    svn_pool_destroy(subpool);
1227
1228
    return 0;
1229
}
1230
1231
int crawl_filesystem(char *repos_path, char *root_path, apr_pool_t *pool)
1232
{
1233
    char                 *path;
1234
1235
    apr_hash_t           *props;
1236
    apr_hash_index_t     *i;
1237
1238
    svn_repos_t          *repos;
1239
    svn_fs_t             *fs;
1240
    svn_string_t         *svndate;
1241
    svn_revnum_t         youngest_rev, export_rev;
1242
    svn_fs_root_t        *fs_root;
1243
1244
    SVN_ERR(svn_fs_initialize(pool));
1245
    SVN_ERR(svn_repos_open(&repos, repos_path, pool));
1246
    if ((fs = svn_repos_fs(repos)) == NULL)
1247
      return -1;
1248
    SVN_ERR(svn_fs_youngest_rev(&youngest_rev, fs, pool));
1249
1250
    export_rev = youngest_rev;
1251
1252
    SVN_ERR(svn_fs_revision_root(&fs_root, fs, export_rev, pool));
1253
    SVN_ERR(svn_fs_revision_proplist(&props, fs, export_rev, pool));
1254
1255
    svndate = apr_hash_get(props, "svn:date", APR_HASH_KEY_STRING);
1256
    archive_time = get_epoch((char *)svndate->data);
1257
1258
    fprintf(stderr, "Exporting archive of r%ld... \n", export_rev);
1259
1260
    dump_tree(fs_root, root_path, "", pool);
1261
1262
    tar_footer();
1263
1264
    fprintf(stderr, "done!\n");
1265
1266
    return 0;
1267
}
1268
1269
int main(int argc, char *argv[])
1270
{
1271
    apr_pool_t           *pool;
1272
    apr_getopt_t         *options;
1273
1274
    apr_getopt_option_t long_options[] = {
1275
        { "help",     'h', 0 },
1276
        { "prefix",   'p', 0 },
1277
        { "basename", 'b', 0 },
1278
        { "revision", 'r', 0 },
1279
        { NULL,       0,   0 }
1280
    };
1281
1282
    if (argc < 2) {
1283
        fprintf(stderr, "usage: %s REPOS_PATH [prefix]\n", argv[0]);
1284
        return -1;
1285
    }
1286
1287
    if (apr_initialize() != APR_SUCCESS) {
1288
        fprintf(stderr, "You lose at apr_initialize().\n");
1289
        return -1;
1290
    }
1291
1292
    pool = svn_pool_create(NULL);
1293
1294
    crawl_filesystem(argv[1], (argc == 3 ? argv[2] : TRUNK), pool);
1295
1296
    apr_terminate();
1297
1298
    return 0;
1299
}
1300
/*
1301
 * svn-fast-export.c
1302
 * ----------
1303
 *  Walk through each revision of a local Subversion repository and export it
1304
 *  in a stream that git-fast-import can consume.
1305
 *
1306
 * Author: Chris Lee <clee@kde.org>
1307
 * License: MIT <http://www.opensource.org/licenses/mit-license.php>
1308
 */
1309
1310
#define _XOPEN_SOURCE
1311
#include <unistd.h>
1312
#include <string.h>
1313
#include <stdio.h>
1314
#include <time.h>
1315
1316
#ifndef PATH_MAX
1317
#define PATH_MAX 4096
1318
#endif
1319
1320
#include <apr_lib.h>
1321
#include <apr_getopt.h>
1322
#include <apr_general.h>
1323
1324
#include <svn_fs.h>
1325
#include <svn_repos.h>
1326
#include <svn_pools.h>
1327
#include <svn_types.h>
1328
1329
#undef SVN_ERR
1330
#define SVN_ERR(expr) SVN_INT_ERR(expr)
1331
#define apr_sane_push(arr, contents) *(char **)apr_array_push(arr) = contents
1332
1333
#define TRUNK "/trunk/"
1334
1335
time_t get_epoch(char *svn_date)
1336
{
1337
    struct tm tm = {0};
1338
    char *date = malloc(strlen(svn_date) * sizeof(char *));
1339
    strncpy(date, svn_date, strlen(svn_date) - 8);
1340
    strptime(date, "%Y-%m-%dT%H:%M:%S", &tm);
1341
    free(date);
1342
    return mktime(&tm);
1343
}
1344
1345
int dump_blob(svn_fs_root_t *root, char *full_path, apr_pool_t *pool)
1346
{
1347
    apr_size_t     len;
1348
    svn_stream_t   *stream, *outstream;
1349
    svn_filesize_t stream_length;
1350
1351
    SVN_ERR(svn_fs_file_length(&stream_length, root, full_path, pool));
1352
    SVN_ERR(svn_fs_file_contents(&stream, root, full_path, pool));
1353
1354
    fprintf(stdout, "data %lu\n", stream_length);
1355
    fflush(stdout);
1356
1357
    SVN_ERR(svn_stream_for_stdout(&outstream, pool));
1358
    SVN_ERR(svn_stream_copy(stream, outstream, pool));
1359
1360
    fprintf(stdout, "\n");
1361
    fflush(stdout);
1362
1363
    return 0;
1364
}
1365
1366
int export_revision(svn_revnum_t rev, svn_fs_t *fs, apr_pool_t *pool)
1367
{
1368
    unsigned int         mark;
1369
    const void           *key;
1370
    void                 *val;
1371
    char                 *path, *file_change;
1372
    apr_pool_t           *revpool;
1373
    apr_hash_t           *changes, *props;
1374
    apr_hash_index_t     *i;
1375
    apr_array_header_t   *file_changes;
1376
    svn_string_t         *author, *committer, *svndate, *svnlog;
1377
    svn_boolean_t        is_dir;
1378
    svn_fs_root_t        *fs_root;
1379
    svn_fs_path_change_t *change;
1380
1381
    fprintf(stderr, "Exporting revision %ld... ", rev);
1382
1383
    SVN_ERR(svn_fs_revision_root(&fs_root, fs, rev, pool));
1384
    SVN_ERR(svn_fs_paths_changed(&changes, fs_root, pool));
1385
    SVN_ERR(svn_fs_revision_proplist(&props, fs, rev, pool));
1386
1387
    revpool = svn_pool_create(pool);
1388
1389
    file_changes = apr_array_make(pool, apr_hash_count(changes), sizeof(char *));
1390
    mark = 1;
1391
    for (i = apr_hash_first(pool, changes); i; i = apr_hash_next(i)) {
1392
        svn_pool_clear(revpool);
1393
        apr_hash_this(i, &key, NULL, &val);
1394
        path = (char *)key;
1395
        change = (svn_fs_path_change_t *)val;
1396
1397
        SVN_ERR(svn_fs_is_dir(&is_dir, fs_root, path, revpool));
1398
1399
        if (is_dir || strncmp(TRUNK, path, strlen(TRUNK))) {
1400
            continue;
1401
        }
1402
1403
        if (change->change_kind == svn_fs_path_change_delete) {
1404
            apr_sane_push(file_changes, (char *)svn_string_createf(pool, "D %s", path + strlen(TRUNK))->data);
1405
        } else {
1406
            apr_sane_push(file_changes, (char *)svn_string_createf(pool, "M 644 :%u %s", mark, path + strlen(TRUNK))->data);
1407
            fprintf(stdout, "blob\nmark :%u\n", mark++);
1408
            dump_blob(fs_root, (char *)path, revpool);
1409
        }
1410
    }
1411
1412
    if (file_changes->nelts == 0) {
1413
        fprintf(stderr, "skipping.\n");
1414
        svn_pool_destroy(revpool);
1415
        return 0;
1416
    }
1417
1418
    author = apr_hash_get(props, "svn:author", APR_HASH_KEY_STRING);
1419
    if (svn_string_isempty(author))
1420
        author = svn_string_create("nobody", pool);
1421
    svndate = apr_hash_get(props, "svn:date", APR_HASH_KEY_STRING);
1422
    svnlog = apr_hash_get(props, "svn:log", APR_HASH_KEY_STRING);
1423
1424
    fprintf(stdout, "commit refs/heads/master\n");
1425
    fprintf(stdout, "committer %s <%s@localhost> %ld -0000\n", author->data, author->data, get_epoch((char *)svndate->data));
1426
    fprintf(stdout, "data %d\n", svnlog->len);
1427
    fputs(svnlog->data, stdout);
1428
    fprintf(stdout, "\n");
1429
    fputs(apr_array_pstrcat(pool, file_changes, '\n'), stdout);
1430
    fprintf(stdout, "\n\n");
1431
    fflush(stdout);
1432
1433
    svn_pool_destroy(revpool);
1434
1435
    fprintf(stderr, "done!\n");
1436
1437
    return 0;
1438
}
1439
1440
int crawl_revisions(char *repos_path)
1441
{
1442
    apr_pool_t   *pool, *subpool;
1443
    svn_fs_t     *fs;
1444
    svn_repos_t  *repos;
1445
    svn_revnum_t youngest_rev, min_rev, max_rev, rev;
1446
1447
    pool = svn_pool_create(NULL);
1448
1449
    SVN_ERR(svn_fs_initialize(pool));
1450
    SVN_ERR(svn_repos_open(&repos, repos_path, pool));
1451
    if ((fs = svn_repos_fs(repos)) == NULL)
1452
        return -1;
1453
    SVN_ERR(svn_fs_youngest_rev(&youngest_rev, fs, pool));
1454
1455
    min_rev = 1;
1456
    max_rev = youngest_rev;
1457
1458
    subpool = svn_pool_create(pool);
1459
    for (rev = min_rev; rev <= max_rev; rev++) {
1460
        svn_pool_clear(subpool);
1461
        export_revision(rev, fs, subpool);
1462
    }
1463
1464
    svn_pool_destroy(pool);
1465
1466
    return 0;
1467
}
1468
1469
int main(int argc, char *argv[])
1470
{
1471
    if (argc != 2) {
1472
        fprintf(stderr, "usage: %s REPOS_PATH\n", argv[0]);
1473
        return -1;
1474
    }
1475
1476
    if (apr_initialize() != APR_SUCCESS) {
1477
        fprintf(stderr, "You lose at apr_initialize().\n");
1478
        return -1;
1479
    }
1480
1481
    crawl_revisions(argv[1]);
1482
1483
    apr_terminate();
1484
1485
    return 0;
1486
}
1487
#!/usr/bin/python
1488
#
1489
# svn-fast-export.py
1490
# ----------
1491
#  Walk through each revision of a local Subversion repository and export it
1492
#  in a stream that git-fast-import can consume.
1493
#
1494
# Author: Chris Lee <clee@kde.org>
1495
# License: MIT <http://www.opensource.org/licenses/mit-license.php>
1496
1497
trunk_path = '/trunk/'
1498
branches_path = '/branches/'
1499
tags_path = '/tags/'
1500
1501
first_rev = 1
1502
final_rev = 0
1503
1504
import sys, os.path
1505
from optparse import OptionParser
1506
from time import mktime, strptime
1507
from svn.fs import svn_fs_file_length, svn_fs_file_contents, svn_fs_is_dir, svn_fs_revision_root, svn_fs_youngest_rev, svn_fs_revision_proplist, svn_fs_paths_changed
1508
from svn.core import svn_pool_create, svn_pool_clear, svn_pool_destroy, svn_stream_for_stdout, svn_stream_copy, svn_stream_close, run_app
1509
from svn.repos import svn_repos_open, svn_repos_fs
1510
1511
ct_short = ['M', 'A', 'D', 'R', 'X']
1512
1513
def dump_file_blob(root, full_path, pool):
1514
    stream_length = svn_fs_file_length(root, full_path, pool)
1515
    stream = svn_fs_file_contents(root, full_path, pool)
1516
    sys.stdout.write("data %s\n" % stream_length)
1517
    sys.stdout.flush()
1518
    ostream = svn_stream_for_stdout(pool)
1519
    svn_stream_copy(stream, ostream, pool)
1520
    svn_stream_close(ostream)
1521
    sys.stdout.write("\n")
1522
1523
1524
def export_revision(rev, repo, fs, pool):
1525
    sys.stderr.write("Exporting revision %s... " % rev)
1526
1527
    revpool = svn_pool_create(pool)
1528
    svn_pool_clear(revpool)
1529
1530
    # Open a root object representing the youngest (HEAD) revision.
1531
    root = svn_fs_revision_root(fs, rev, revpool)
1532
1533
    # And the list of what changed in this revision.
1534
    changes = svn_fs_paths_changed(root, revpool)
1535
1536
    i = 1
1537
    marks = {}
1538
    file_changes = []
1539
1540
    for path, change_type in changes.iteritems():
1541
        c_t = ct_short[change_type.change_kind]
1542
        if svn_fs_is_dir(root, path, revpool):
1543
            continue
1544
1545
        if not path.startswith(trunk_path):
1546
            # We don't handle branches. Or tags. Yet.
1547
            pass
1548
        else:
1549
            if c_t == 'D':
1550
                file_changes.append("D %s" % path.replace(trunk_path, ''))
1551
            else:
1552
                marks[i] = path.replace(trunk_path, '')
1553
                file_changes.append("M 644 :%s %s" % (i, marks[i]))
1554
                sys.stdout.write("blob\nmark :%s\n" % i)
1555
                dump_file_blob(root, path, revpool)
1556
                i += 1
1557
1558
    # Get the commit author and message
1559
    props = svn_fs_revision_proplist(fs, rev, revpool)
1560
1561
    # Do the recursive crawl.
1562
    if props.has_key('svn:author'):
1563
        author = "%s <%s@localhost>" % (props['svn:author'], props['svn:author'])
1564
    else:
1565
        author = 'nobody <nobody@localhost>'
1566
1567
    if len(file_changes) == 0:
1568
        svn_pool_destroy(revpool)
1569
        sys.stderr.write("skipping.\n")
1570
        return
1571
1572
    svndate = props['svn:date'][0:-8]
1573
    commit_time = mktime(strptime(svndate, '%Y-%m-%dT%H:%M:%S'))
1574
    sys.stdout.write("commit refs/heads/master\n")
1575
    sys.stdout.write("committer %s %s -0000\n" % (author, int(commit_time)))
1576
    sys.stdout.write("data %s\n" % len(props['svn:log']))
1577
    sys.stdout.write(props['svn:log'])
1578
    sys.stdout.write("\n")
1579
    sys.stdout.write('\n'.join(file_changes))
1580
    sys.stdout.write("\n\n")
1581
1582
    svn_pool_destroy(revpool)
1583
1584
    sys.stderr.write("done!\n")
1585
1586
    #if rev % 1000 == 0:
1587
    #    sys.stderr.write("gc: %s objects\n" % len(gc.get_objects()))
1588
    #    sleep(5)
1589
1590
1591
def crawl_revisions(pool, repos_path):
1592
    """Open the repository at REPOS_PATH, and recursively crawl all its
1593
    revisions."""
1594
    global final_rev
1595
1596
    # Open the repository at REPOS_PATH, and get a reference to its
1597
    # versioning filesystem.
1598
    repos_obj = svn_repos_open(repos_path, pool)
1599
    fs_obj = svn_repos_fs(repos_obj)
1600
1601
    # Query the current youngest revision.
1602
    youngest_rev = svn_fs_youngest_rev(fs_obj, pool)
1603
1604
1605
    first_rev = 1
1606
    if final_rev == 0:
1607
        final_rev = youngest_rev
1608
    for rev in xrange(first_rev, final_rev + 1):
1609
        export_revision(rev, repos_obj, fs_obj, pool)
1610
1611
1612
if __name__ == '__main__':
1613
    usage = '%prog [options] REPOS_PATH'
1614
    parser = OptionParser()
1615
    parser.set_usage(usage)
1616
    parser.add_option('-f', '--final-rev', help='Final revision to import',
1617
                      dest='final_rev', metavar='FINAL_REV', type='int')
1618
    parser.add_option('-t', '--trunk-path', help='Path in repo to /trunk',
1619
                      dest='trunk_path', metavar='TRUNK_PATH')
1620
    parser.add_option('-b', '--branches-path', help='Path in repo to /branches',
1621
                      dest='branches_path', metavar='BRANCHES_PATH')
1622
    parser.add_option('-T', '--tags-path', help='Path in repo to /tags',
1623
                      dest='tags_path', metavar='TAGS_PATH')
1624
    (options, args) = parser.parse_args()
1625
1626
    if options.trunk_path != None:
1627
        trunk_path = options.trunk_path
1628
    if options.branches_path != None:
1629
        branches_path = options.branches_path
1630
    if options.tags_path != None:
1631
        tags_path = options.tags_path
1632
    if options.final_rev != None:
1633
        final_rev = options.final_rev
1634
1635
    if len(args) != 1:
1636
        parser.print_help()
1637
        sys.exit(2)
1638
1639
    # Canonicalize (enough for Subversion, at least) the repository path.
1640
    repos_path = os.path.normpath(args[0])
1641
    if repos_path == '.':
1642
        repos_path = ''
1643
1644
    # Call the app-wrapper, which takes care of APR initialization/shutdown
1645
    # and the creation and cleanup of our top-level memory pool.
1646
    run_app(crawl_revisions, repos_path)