LibGuides: Research data management: Data Archiving and Sharing

Data Archiving and Sharing

Data archives are for long-term preservation of digital data. Most digital storage media (optical discs, hard drives) have reliable lifetimes of only a few years. An archive ensures that data is preserved and maintained in file formats that are most likely to be useable in the future.

Data sharing is considered an important part of academic research that encourages open inquiry into research results and conclusions, as well as promoting data reuse and repurposing. Most archives facilitate data sharing and allow the owner to maintain control over their data without needing to provide the facilities themselves.

Data Sharing Methods

Data dissemination is actively making your data accessible to others.

Data sharing is done in three ways:

Email request – interested parties email and request the dataset.
Website – researchers place datasets on their website for anyone to download.
Archiving – researchers place their dataset in an archive.

Archiving is the preferred option as most archives provide the dual services of data preservation and dissemination. Typically archives have a search function and are indexed by web search engines, increasing the chances of other researchers using and crediting your datasets and publications. Archiving datasets also means the dataset owner does not need to maintain a website and can specify a wide range of access controls.

If your dataset is online, an archive such as ANU Data Commons can provide permanent and stable links to datasets for citation purchases. This will greatly increase their use and exposure and meet growing funder requirements for access to research data.

Data Sharing Principles - FAIR

ANU supports the FAIR data principles (Findable, Accessible, Interoperable, Reusable), a useful framework for thinking about sharing data in a way that will enable maximum use and reuse.

Findable
This includes assigning a persistent identifier (like a Digital Object Identifier or Handle), having rich metadata and making sure data is findable through disciplinary discovery portals (local and international).
Accessible
This includes making the data open using a standardised protocol.

If data is not open (e.g. for privacy or security concerns, or commercial interests), there should be clarity around the conditions governing access and reuse.

Interoperable
This involves data and metadata using community agreed formats, standards, language and vocabularies, and contain links to related information using identifiers.

Reusable
Reusable data should maintain its initial richness (e.g. it should not be diminished for the purpose of explaining findings in one particular publication). It needs a clear machine readable licence and provenance information on how the data was formed, as well as discipline-specific data and metadata standards to give it rich contextual information that will allow for reuse.

To assess the 'FAIRness' of a dataset and determine how to enhance its FAIRness where applicable, ARDC has developed a FAIR self-assessment tool.

Copyright and Licensing

Where copyright subsists in data, the default position is that it will be owned by the creator. In general, ANU owns the copyright of material generated by staff in the course of their employment. The researcher, however, owns the copyright on their academic publications.

The owner is usually the creator, but some agreements may require joint ownership of data or assign ownership to the funder. Advice should be sought from College Research Offices and Research Services on agreements.

Licenses grant permission for others to use the copyrighted data. Open Content Licenses are an easy way for researchers to license their data for others to use. Researchers can choose the most suitable license for their needs rather than develop a custom license themselves. The most notable open content licenses are:

Creative Commons - most popular open content licenses.
Science Commons - tailored for scientific data and publications.
GNU Free Documentation License - used by Wikipedia.

ANU Data Commons has a copyright license that can be used by depositors to give archive permission to store and maintain the data, whilst leaving ownership of the data with the researcher.

File Formats and Standards

Before creating the data you should consider what formats and standards you should use.

Using an inappropriate file format make your life more difficult in the long run, and it can be tricky to convert between file formats.

It is best to use open formats as they are more likely to be readable in the future and are easier to share with others.

Some examples of open formats are:

PDF - document format
Open Document Format (ODF) - used by LibreOffice Writer (similar to Microsoft Word)
PNG, TIFF, JPEG - image formats
MPEG - audio format

Access Restrictions

When data is in a final state and ready for dissemination or archiving, you should define the access restrictions on each item of data:

Unrestricted – anyone can download.
Registered – users must give their name and affiliation so the data owner can track who is using their data.
Requested – users must submit a request outlining how they will use the data.
Closed – no access (i.e., confidential data).
Metadata only

Archiving

Archiving of final research data is encouraged and often required. Archiving takes care of dissemination, access control and security, and ensures the data will not be lost, forgotten or become unusable.

The objective of the archive is to preserve the data and – if the data owner allows it – make data available for further research. The owner of the data can specify a range of access restrictions, including an embargo so data cannot be accessed until after a specified date.

An archive provides long-term storage of data and therefore prefers file formats that are unlikely to become obsolete. Most file formats can be converted to a suitable archiving format but some loss in quality or distortion may occur. Most archives are able to perform the conversion, but it is best if the depositor does the conversion to ensure that they are happy with the result.

The time and costs associated with archiving are often underestimated. Each item of data deposited will need to have metadata written for it, which can be very time consuming. It is therefore best to write metadata as the data is created and to archive data continuously rather than leaving it until the end of the project. It is recommended that you include the costs of archiving in your grant application.