Posts

Showing posts from March, 2018

Share and backup data sets with Dat

Image
If you work in genomics, you'll know that sharing large data sets is hard. For instance our group has shared data with our collaborators a number of ways: DVDs, hard drives and flash drives FTP Hightail Google Drive links Amazon links SCP/PSCP rsync But none of these are are ideal as we know data sets change over time and none of the above methods are suited to updating a file tree with changes. If changes occur, then it quickly becomes a mess of files that are either redundant or missing entirely. Copied files could become corrupted. What we need is a type of version control for data sets. That's the goal of dat . So now I'll take you through a simple example of sharing a data set using dat. #Install instructions for Ubuntu 16.04 $ sudo npm cache clean -f $ sudo npm install -g n $ sudo n stable $ sudo npm install -g dat # Files I'm sharing on PC 1: DGE table and 3 genelists (3.4 MB) $ tree . ├── Aza_DESeq_wCounts.tsv └── list     ├── Aza_D