Integration of DSpace with other applications » History » Version 5

Steve Welburn, 2012-07-10 09:41 AM

1 1 Steve Welburn
h1. Integration of DSpace with other applications
2 1 Steve Welburn
3 1 Steve Welburn
h2. Submitting files to repository
4 1 Steve Welburn
5 1 Steve Welburn
* "Apache FTP Server":http://mina.apache.org/ftpserver/ - has hooks for processing after FTP commands. Could provide basic interface to repository.
6 1 Steve Welburn
7 1 Steve Welburn
* "Storage Resource Broker (SRB) integration with DSpace":https://wiki.duraspace.org/display/DSPACE/DspaceSrbIntegration#DspaceSrbIntegration-MoreInformationabouttheProject  (including Registration to ingest data)
8 1 Steve Welburn
9 1 Steve Welburn
* "SymlinkDSpace":https://wiki.duraspace.org/display/DSPACE/SymlinkDSpace Extension to add symlinks to large files instead of uploading them (e.g. videos)
10 1 Steve Welburn
11 1 Steve Welburn
* "DepositMO project":http://www.eprints.org/depositmo/ Has scripts to upload directly from a "watch directory", and also extensions to the SWORD2 protocol.
12 1 Steve Welburn
13 1 Steve Welburn
* "Some suggestions about creating Sword packages":http://dspace.2283337.n4.nabble.com/Producing-mets-xml-for-SWORD-td3285563.html
14 1 Steve Welburn
15 1 Steve Welburn
* "SwordUploader":https://code.soundsoftware.ac.uk/projects/sworduploader
16 1 Steve Welburn
17 3 Steve Welburn
h3. "DataStage":http://rdm.c4dm.eecs.qmul.ac.uk/datastage-and-dspace
18 1 Steve Welburn
19 3 Steve Welburn
To work with C4DM's DSpace the way SWORDUPLOADER works, changes are required to the file (Datastage version 0.3rc2): /usr/lib/python2.6/dist-packages/datastage/dataset/sword2depositor.py .
20 3 Steve Welburn
21 1 Steve Welburn
At line 66, it should read:
22 1 Steve Welburn
<pre>
23 1 Steve Welburn
receipt = conn.create(col_iri=col.href, metadata_entry=e, suggested_identifier=dataset.identifier,in_progress=True)
24 1 Steve Welburn
</pre>
25 1 Steve Welburn
26 1 Steve Welburn
Around line 133, should read:
27 1 Steve Welburn
<pre>
28 1 Steve Welburn
new_receipt = comm.update(dr = receipt,
29 1 Steve Welburn
                         payload=data,
30 1 Steve Welburn
                         mimetype="application/zip",
31 1 Steve Welburn
                         filename=dataset.identifier + "zip",
32 1 Steve Welburn
                         in_progress=True,
33 1 Steve Welburn
                         packaging='http://dataflow.ox.ac.uk/package/DataBankBagIt')
34 1 Steve Welburn
</pre>
35 1 Steve Welburn
36 1 Steve Welburn
With these changes, it should be possible to upload files to DSpace AS CONFIGURED AT C4DM! The modified file can be downloaded from "here":https://code.soundsoftware.ac.uk/attachments/446/sword2depositor.py
37 3 Steve Welburn
38 3 Steve Welburn
Additionally, the DataStage server doesn't start properly in VirtualBox. In order to submit files, it is necessary to:
39 3 Steve Welburn
<pre>
40 3 Steve Welburn
sudo datastage-server stop
41 3 Steve Welburn
sudo datastage-server start
42 3 Steve Welburn
</pre>
43 1 Steve Welburn
44 5 Steve Welburn
Also see "this blog post":http://rdm.c4dm.eecs.qmul.ac.uk/datastage-and-dspace
45 4 Steve Welburn
46 1 Steve Welburn
h2. File Conversion
47 1 Steve Welburn
48 1 Steve Welburn
* "Xena":http://xena.sourceforge.net converts files to open formats.
49 1 Steve Welburn
50 1 Steve Welburn
h2. Metadata Sources
51 1 Steve Welburn
52 1 Steve Welburn
* "Library of New Zealand Metadata Extractor":http://meta-extractor.sourceforge.net/ - extracts metadata from binary files and outputs as XML.
53 1 Steve Welburn
54 1 Steve Welburn
* "Digital Record Object IDentification - DROID":http://sourceforge.net/projects/droid/files/droid/ - identifies file types and generates summary statistics. Now on version 6.
55 1 Steve Welburn
56 1 Steve Welburn
* "JSTOR/Harvard Object Validation Environment - JHove":http://hul.harvard.edu/jhove/
57 1 Steve Welburn
58 1 Steve Welburn
* "JHove2":http://www.bitbucket.org/jhove2/main - Actually uses DROID 4 for file identification.
59 1 Steve Welburn
60 1 Steve Welburn
* "Apache Tika":http://tika.apache.org/
61 1 Steve Welburn
62 1 Steve Welburn
The SCAlabe Preservation Environments (SCAPE) project compared DROID, Fido, Unix File Utility, FITS and JHove2 for identifying types of files. 
63 1 Steve Welburn
64 1 Steve Welburn
"Downloaded from Open Planets Foundation":http://www.openplanetsfoundation.org/system/files/SCAPE_PC_WP1_identification21092011_0.pdf (Attached: attachment:SCAPE_PC_WP1_identification21092011.pdf)
65 1 Steve Welburn
66 1 Steve Welburn
bq. The main difference is that identification is only one part of JHOVE2’s functionality: it also includes feature extraction, validation and policy‐based assessment. These are all outside of the scope of this evaluation. It also means that any computational performance results cannot be directly compared with dedicated identification tools (although JHOVE2’s performance issues appear to be caused mainly by DROID 4, with JHOVE2’s native modules adding very little overhead).
67 1 Steve Welburn
68 1 Steve Welburn
h2. Extending DSpace Metadata Support
69 1 Steve Welburn
70 1 Steve Welburn
* "Semantic web extensions for DSpace":http://simile.mit.edu/ adds _support for arbitrary schemata and metadata_ using semantic web technologies.