The .gitignore

Module Learning Objectives

By the end of this module, you will be able to:

  • Create a .gitignore in a repository using a GitHub template
  • Explain the purpose of a .gitignore in collaborative projects
  • Evaluate the content you think might warrant adding to the .gitignore

Overview

The purpose of the .gitignore is evident in its name: anything included in it is ignored by Git. This can be useful if you want to include data in your local version of your repository but don’t want to risk sharing that data by committing it to a GitHub repository you plan to make public at some point. This file exists at the top level of every repository and can be customized however is most useful to you and your collaborators.

Creating a .gitignore

When you first create a repository you will have the opportunity to select a template .gitignore based on the coding language you plan on using. In the “New Repository” process, scroll down to just below where you can choose to add a README.

Underneath the title “Add .gitignore” there is a dropdown menu for various templates that starts at “None”. If you click this dropdown menu you can type in keywords that will help you identify the template that you want to use.

Unfortunately for R users, typing just “R” returns all templates that contain the letter ‘r’ so you will need to scroll a bit to find the option that is actually for the R Statistical Environment. Once you find the option you want, click it and the template dropdown should change from “.gitignore template: None” to “.gitignore template: <your pick>”.

If you create a repository without choosing a .gitignore template, don’t worry! You can always create one later! When cloning a repository using RStudio, a .gitignore file will be automatically created when setting up the RStudio project on your local computer. If you clone via the command line touch .gitignore would create the necessary file.

Interacting with the .gitignore

Once you have the one, you can open the .gitignore by clicking it from the “Files” pane of RStudio.

After you click it you’ll be looking at a file that looks very similar to any other file in RStudio.

Now let’s add something to it! As you can see from the Git pane of the above image (top right), after cloning our new repository, the only file Git is flagging as untracked is the .Rproj file created whenever you make a new RStudio project.

If we add *.Rproj to our .gitignore, the Git pane will show that the only change is that lines have been added to the .gitignore. The .Rproj file with the double yellow question marks next to it is gone!

Once you’ve edited the .gitignore it is generally a good practice to push those changes as soon as possible. If you flag a certain file (or folder) to be ignored while someone else is depending on it, if you wait to push the change you can risk a serious conflict. Or, they might commit a file you want ignored which is frustrating to remove (see “Exceptions to the .gitignore” below).

Only the commit is pictured above, but pushing & pulling works in the same way here as it does elsewhere.

Our Recommendations

There are many different opinions on what should go into a .gitignore but we have a few suggestions that might prove helpful (or are at least worth considering).

  1. Use the GitHub .gitignore template for your chosen programming environment

    • There are a lot of small files that typical users don’t care about that your project will accumulate over its lifecycle. If you don’t flag these in the .gitignore it can become difficult to sort through your repository
  2. Add .DS_Store

    • Macs create a “Desktop Services Store” file (or “.DS_Store” for short) every time you open a folder. This file is invisible in your file manager but can be committed. This file has no practical value in your project and a separate one exists in every subfolder so tracking them can quickly clutter a repository if you use subfolders
  3. Other Considerations

    • Some people recommend adding everything your script creates to the .gitignore. The theory being that if someone wants to see those outputs, they would only need to run your script(s) in order to create their own versions
    • Another way of thinking about this is to create a dedicated folder to store products in (e.g., “data”, “exports”, etc.) and then add that folder to the .gitignore. This means you don’t need to worry about adding specific files to the .gitignore so long as all the files live in a folder you’ve already designated as something for Git to not track
  4. Do you have another idea for what you typically add to a .gitignore?

Should You Ignore the .Rproj File?

If you are an RStudio user, you could choose to add the R project file to your .gitignore. However, this is a somewhat thorny decision with strong arguments on both sides. We’ll try to summarize some of the big considerations below to help inform your decision.

Option A: Commit the .Rproj

There are (in our opinion) three strong arguments for committing the .Rproj:

  1. Ease of Access

    • For non-R users, it can be really helpful to have just one file that they need to double click in order to open and interact with the whole R project. By committing the .Rproj, you make the post-clone start up instructions much simpler (i.e., “click the file ending in ‘.Rproj’ and you’re good to go”)
  2. Facilitate Cloning Outside of RStudio

    • While our workshop has exclusively covered cloning a repo through RStudio, it is absolutely possible to clone without using RStudio. If someone goes this route, and the .Rproj is not already included in the repo, they will be left without one. So, after the clone they would then have an extra step to create their own .Rproj if they wanted to use RStudio to interact with the code
    • Note that if everyone on your team is cloning via RStudio, this is not really an issue because cloning via RStudio creates a new .Rproj if one does not already exist
  3. Enforce Project Settings

    • Fundamentally, the .Rproj is a list of settings that you’re using for a specific project in RStudio. If you wanted to create some defaults of those settings for anyone who cloned your repository, you could commit your .Rproj with your preferred settings
    • That said, any member of your team who changed those settings would have the option to commit the corresponding change to the .Rproj file and the next time you pull you would have their settings instead

Option B: Ignore the .Rproj

We feel there are some serious reasons for ignoring the .Rproj as well:

  1. Risk of Deletion

    • Say you decide to commit your .Rproj, what happens if someone deletes it and pushes that deletion? This can happen if team members clone their version of the repo with a different name resulting in two local .Rproj files where one is redundant from that team member’s perspective
    • If a .Rproj is committed and then subsequently deleted, everyone who pulls that change will delete their local project. This results in an instant error that can be difficult to recognize if you don’t know beforehand about this risk. The only fix is for all affected users to re-clone the repository
    • So, if you add the .Rproj to the .gitignore preemptively, you can avoid the risk of this happening at some later stage; you could also add it as “*.Rproj” so regardless of the name of the file it will be ignored if the file type is .Rproj
  2. Loss of Flexibility

    • When you commit your .Rproj you are “locking in” the name of your project. If your preliminary data exploration and analyses bring new hypotheses and goals to light you may want to change your project name to better match this new framing. If you had previously committed your .Rproj, it can be a hassle to change its name and push this change
    • Alternatively, you may want to change your GitHub repository name after publication to share keywords with your final manuscript title or the journal name to make the repository more easily searchable by your readers. Technically you could change the GitHub repository name without changing the corresponding .Rproj filename but there will be some disonance there if your readers clone your code
  3. Preserve Team Members’ Local Filing Decisions

    • Most working group members are involved in a staggering variety of pursuits in addition to their role in a synthesis working group. People tend to have their own filing systems in their computers to help them quickly and easily navigate among their different projects
    • If you commit the .Rproj, you force everyone who clones your repository to use your naming convention which can be a hurdle (albeit likely a minor one) as they juggle their responsibilities

Decision

Ultimately, this is up to you and your team to decide! Hopefully our outlined reasons for and against adding the .Rproj to the .gitignore help inform this decision and we are happy to discuss this more if you have follow up questions!

Ignoring Previously Tracked Content

Imagine a situation where you commit a data file to your GitHub repository and push that commit. Now your data file is tracked by Git and every time you alter that file RStudio’s “Git” pane will notify you that the file changed by placing the blue “M” next to the file’s name. Let’s say that eventually you decide that you want to (1) remove the file from your GitHub repository and (2) stop Git from tracking future changes to the file. If this is the case, you’ll need to do the following:

  1. Either (A) delete the file or (B) move it out of the repository’s folder

    • Either option will register as “deleting” the file from Git’s perspective
  2. Commit the deletion

  3. Push that change

  4. Add the name of that file to your .gitignore

Once you’ve done these steps, even if you put the file back in your repository, Git won’t track its addition or any changes to it over time.