Data archives are for long term preservation of digital data. Most digital storage media (optical discs, hard drives) have reliable lifetimes of only a few years. An archive ensures that data is preserved and maintained in file formats that are most likely to be useable in the future.
Data sharing is considered an important part of academic research that encourages open inquiry into research results and conclusions, as well as promoting data reuse and repurposing. Most archives facilitate data sharing and allow the data owner to maintain control over their data without needing to provide the facilities themselves.
Topics covered are:
Data dissemination is actively making your data accessible to others. Some researchers make their datasets available via their personal or group websites.
Data sharing is done in three ways:
Archiving is the preferred option as most archives the dual purpose of data preservation and dissemination. Their archives usually have a search utility and are often indexed by the major web search engines, thus increasing the chances of other researchers using and crediting your datasets and publications. Archiving datasets also means the dataset owner does not need to maintain a website and can specify a wide range of access controls.
If your dataset is online, then including the link in your publications will greatly increase its use and exposure.
ANU supports the FAIR data principles (Findable, Accessible, Interoperable, Reusable) drafted by the FORCE11 group in 2015. The principles are a useful framework for thinking about sharing data in a way that will enable maximum use and reuse
Translating the FAIR principles in practice will be different for different disciplines, however the below guidelines set out the broad principles:
This includes assigning a persistent identifier (like a DOI or Handle), having rich metadata to describe the data and making sure it is findable through disciplinary discovery portals (local and international).
This may include making the data open using a standardised protocol. However the data does not necessarily have to be open. There are sometimes good reasons why data cannot be made open, for example privacy concerns, national security or commercial interests. If it is not open there should be clarity and transparency around the conditions governing access and reuse.
To be interoperable the data will need to use community agreed formats, language and vocabularies. The metadata will also need to use a community agreed standards and vocabularies, and contain links to related information using identifiers.
Reusable data should maintain its initial richness. For example, it should not be diminished for the purpose of explaining the findings in one particular publication. It needs a clear machine readable licence and provenance information on how the data was formed. It should also have discipline-specific data and metadata standards to give it rich contextual information that will allow for reuse.
The owner of any original data holds copyright over that data from the time the data is created. In general, the ANU owns the copyright of material generated by staff in the course of their employment. The researcher, however, owns the copyrights on academic publications.
The owner is usually the creator, but some agreements may require joint ownerships of data or assign ownership to the funder. Advice should be sought from College Research Offices and Research Services on agreements.
Licenses grant permission for others to use the copyrighted data. Open Content Licenses are an easy way for researchers to license their data for others to use. Researchers can choose the most suitable license for their needs rather than develop a custom license themselves. The most notable open content licenses are:
The ANU’s institutional repository has a copyright license that can be used by depositors to give archive permission to store and maintain the data, whilst leaving ownership of the data with the researcher.
Before creating the data you should consider what formats and standards you should use as it is sometimes difficult to convert between file formats. Using an inappropriate file format will also make your life more difficult in the long run.
Where possible, it is best to use open formats as they are more likely to be readable in the future and are easier to share with others. It is usually safe to use a proprietary format if it is very widespread as free programs will most likely exist to read these formats.
Some examples of open formats are:
When data is in a final state and ready for dissemination or archiving, you should define the access restrictions on each item of data:
Archiving of final research data is encouraged and in some cases required. Archiving your data ensures the data will not be lost, forgotten, or become unusable due to being stored in legacy file formats or storage media. Archiving also takes care of dissemination, access control and security.
Archives generally only accept final state data. The objective of the archive is to preserve the data and – if the data owner allows it – make the data available for further research. The owner of the data can specify a range of access restrictions; although, each archive will use different terminology. It is also possible to embargo data such that the data cannot be accessed until after a specified date. This is often done to give the data creators time to publish their results before making their data public.
An archive provides long-term storage of data and therefore prefers file formats that are unlikely to become obsolete. Most file formats can be converted to a suitable archiving format but some loss in quality (such as images or audio) or distortion (such as converting PowerPoint to PDF) may occur. Most archives are able to perform the conversion but it is best if the depositor does the conversion to ensure that they are happy with the result.
The time and costs associated with archiving are often underestimated. Each item of data deposited will need to have metadata written for it, which will be very time consuming if your data consists of several hundred images that were taken some years ago. It is therefore best to write metadata as the data is created and to archive data continuously rather than leaving it until the end of the project. It is recommended that you include the costs of archiving in your grant application.