Chapter 4 Project Management

There are many variants on this design, feel free to modify to suit your needs. However, the basic structure of what will be presented will help you in the long run.

Each project should have its own directory. Note I will use directory and folder interchangeably. Your project directory should contain a README file, which explains the purpose of the project, and what you have done. Within the directory, there should be multiple sub folders. First is data, where you will store the raw data for the project. The data in this folder should never be written to or modified. I also will have a reference directory, where I will store the reference genome, transcriptome, or other data obtained from external databases. It is good to store this data with the project, so that you can be assured you will always have access to the same version of the data. This may not be practical for large databases (eg blast nr), so take care to note database versions in your scripts. You should have a folder, traditionally called src, which contains all the scripts and source code you write for the project. All external software should be installed in a folder called bin. More on this in Chapter 7.

For each analysis you do, or each tool you run, there should be an output directory for it which contains all the output from the analysis. It may be useful to further subdivide each of these folders into more subfolders, with different sorts of data.

Also in this top level directory you should have you .git folder. This will be generated when you use git (Chapter 6). Note that all you really want under version control is the README and the src directory. There are no versions for the raw data, and all the derived data should be coming directly from a script.

4.0.1 Directory Setup

Name File/Folder Use
README.md File An overview of what the project is, and a history of what you’ve done
meta File Metadata about the samples
reads Folder All the sequence files
Data Folder Other data needed for the project, such as reference genomes or annotation files
src Folder All the scripts written for the project
Tool_1 Folder Output from running tool 1
Tool_2 Folder Output from running tool 2
project.Rproj File If you are doing the project as an R project, this file should be in the base directory

Good info here (Wilson et al. 2017)

Describe Folder layout, naming conventions, other things from good enough practices. link to that too.

Bibliography

Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K. Teal. 2017. “Good Enough Practices in Scientific Computing.” PLOS Computational Biology 13 (6). Public Library of Science: 1–20. doi:10.1371/journal.pcbi.1005510.