Skip to main content
Skip navigation

 

Research data management

Data Organization

Data organization is about working more efficiently with data.

The standard methods of data organisation are:

  • File Transfers and Remote Access
  • File Synchronization
  • Collaboration
  • Revision Control
Some automated and more efficient alternatives are suggested below, but keep in mind that they often require some configuration and familiarisation with the software. If the standard methods are adequate for your needs, then it is best to continue using them.

File Transfers and Remote Access

Researchers often need to share primary data and preliminary results with collaborators, or may wish to transfer data to another computer system (e.g. from university to home). 

The most common method for transferring files is with email attachments, but there are limits to the size of files that can be transferred. While removable data storage media (USBs, CDs, etc) can transfer larger files, they require the researcher to physically transfer the data.

Large files are usually transferred using FTP (File Transfer Protocol), which allows the user to download as well as upload.  An FTP client (e.g. FTP Explorer) is used to connect and transfer files, and access can be restricted by username and password.

To assist good data management, ANU provides local area network and Internet access to the Homedrive, a central file space on which each member of the University is allocated file space to store personal files (students/staff 4.5GB). Homedrive is accessible from any Information Commons computer or via https://myfiles.anu.edu.au/

ANU-provided Web applications, such as Alliance, allow data to be accessed and sometimes modified with just a web browser.

File Synchronization

Often researchers will work on multiple computers, and just copy files back and forth between computers. This method has a number of drawbacks:
 
  • It is time consuming to manually copy files.
  • You have multiple copies of data and you can easily lose track of which copy is the latest version.
  • If both copies have been modified, then it is easy to overwrite some changes without knowing.
If you are synchronizing regularly or have lots of files to synchronize, then you should consider using file synchronization software which has the following advantages:
  • Faster
  • Requires less effort
  • Automatically detects when two files have been modified and lets the user choose which one to keep.

File Synchronisation Programs:

  • WinSCP is primarily for SSH and FTP transfers, but can also synchronize data.
  • rsync is an open-source utility for incremental file transfer and synchronization.  It is cross-platform and can be used to generate 'snapshots' and regular backups.
  • Dropbox provides 2GB of data storage for free and also provides good user management tools to support collaborative work.

Collaboration

Many research projects are carried out collaboratively.

When collaborating on simple work and with a small number of collaborators, data transfer is usually done by email, USB or a shared network drive.

For more complex tasks, or for projects with many collaborators - it is worth considering using collaborative software tools, such as the ANU-provided Alliance which provides a wide range of collaborative tools such as forums, chat rooms, calendars, and more.

Revision Control

When data is constantly being edited, especially by multiple users, it is a good idea to implement some form of version control to keep track of changes. This can be as simple as appending a number to the end of a file after each major edit. For example:

  • Journal_v1.0.tex, Journal_v1.2.tex
  • Journal_Feb12.tex, Journal_May5.tex
  • Journal_Feb12_John_DRAFT_WithSallysEdits_NewDiagram.tex

Such conventions are good for simple work but quickly become unmanageable when you have multiple authors or make lots of edits.

The alternative is to use revision (or version) control software. Version control software provides access control, a collaborative work environment, synchronization between home/office/laptop computers, and a degree of data safety.

Such programs offer several advantages:

  • The software requires you to input a description of the changes made, which makes it easier to pick up where you left off and for collaborators to see what you are doing.
  • You can revert to a previous version if you make a mistake.
  • You can easily compare two versions to help you find errors.
  • It implicitly provides synchronization and is good for resolving conflicting changes.

While version control software is in some cases harder to set up, it provides more advanced version tracking. A distributed version control system like Bazaar can be used with Alliance to collaboratively manage documents and data.

TortoiseSVN uses the Subversion system of version control. It integrates with Windows Explorer making it one of the easiest version control systems to use.

Such tools make it easier for multiple users to access and edit the latest version of a document without conflicting with other people’s changes. The entire history is stored making it easier for users to see any changes made, and to revert to an older version if needed.

Responsible Officer: University Librarian/Page Contact: Library Systems & Web Coordinator