View as a slideshow.

Introduction

I’m going to start class today by drawing on the whiteboard to try to illustrate the key concepts behind Git, the tool that we’ll be using to collaborate on our group projects. By the time we’re done, the following terms and definitions should make sense to you:

  • Commit: save the current state of your tracked files (locally). Do this frequently as you work. Write useful messages.
  • Push: send all of your changes (“commits”) somewhere else. You’ll use push to send your source code changes to GitHub to share them with your group.
  • Pull: Grab new changes (“commits”) from somewhere else. You’ll use this to grab the changes your group members have pushed to GitHub.

Now would be an excellent time to ask questions if you’re confused about anything!

We’ll first walk through the steps necessary to get your projects setup on GitHub and in RStudio. Then we’ll walk through an example workflow for committing and sharing changes to a project.


Setup

We’ll do all of the steps that follow today in class. You only have to do them ONCE. We’ll cover the day-to-day workflow with Git in the next section.


Step 1: pick a project name

Confer with your group members to pick a short, informative, name to use for your project. Use “-” instead of spaces (ex. “our-cool-project”), and only alpha-numeric characters. Pay attention to capitalization because it WILL MATTER.

In all of the steps below replace PROJECTNAME with your actual project name (please don’t actually use all caps).


Step 2: create a GitHub account

You will have just gotten an invitation to join our class’s GitHub Organization in your email. Click the link in the email or go to the page above and join the organization.


Step 3: create a new repository on GitHub

This step should be done by ONE PERSON per group.

In this step, we’ll create a repository to hold the source code for your Shiny app on GitHub, under our class’s organization. Login to GitHub and go to the WL-Biol185-ShinyProjects organization page.

Then:

  1. Click on the green New button
  2. For “Repository name” use your PROJECTNAME
  3. Enter a short, useful description; include information about the data that will be featured. You can update this later.
  4. Choose “Public”
  5. CHECK “Initialize this repository with a README”
  6. Drop down “Add .gitignore” and choose “R” (duh)
  7. (Optional) Drop down “Add a license” and pick a license. Hit the little “i” for GitHub’s suggestions. I usually use “GNU General Public License”; if you don’t care about people not sharing changes to your work choose “MIT”.
  8. Go to Settings, then select Collaborators and add your other group members to the github project.


Step 4: setup a new RStudio project

This step should be done by EVERYONE in your group.

  1. On your group’s repository page on GitHub, click the green “Clone or Download” button
  2. Then click the blue “Use HTTPS”* link and then the “Copy to Clipboard” button. The text you copied should start with “https://” and end with “.git”.
  3. In RStudio from the menu bar select “File” -> “New Project…”
  4. Choose “Version Control”, then “Git”, then paste in the URL you copied from GitHub.
  5. (Optional) You can change the name or location of the new directory that will be created to hold the project files. But you probably don’t want to.
  6. Hit “Create”

You should see the same set of files listed on your “Files” tab that you had in your GitHub repository.

From here on out you need to be careful about which project you are in on RStudio. When you want to work on your Shiny app, switch into this project from the Project menu (upper right). When you want to sketch out class work, or work on assignments, make sure you are in a different project or no project!


Step 5: setup Git identity

This step should be done by EVERYONE in your group.

In RStudio from the menu bar select “Tools” -> “Shell…”

In the shell enter these two commands (with your actual email and name; duh.)

git config --global user.email "your-email@mail.wlu.edu"
git config --global user.name  "My Name"


Step 6: setup .gitignore

From the files pane, open up your “.gitignore” file and add the following line:

*.Rproj*

The will prevent you or your groupmates from including RStudio configuration files in the code that you push to GitHub.

Step 7: test out the configuration

This step should be done by EVERYONE in your group.

Let’s test to make sure everything is working so far.

  1. Open up “README.md”
  2. Make a change; any change. Save it.
  3. Next to your “History”" tab, you should have a “Git” tab.
  4. There should be a blue “M” (for modified) next to “README.md”
  5. Click the check box to stage this change, then Commit
  6. In the “Commit message” box enter something like “testing git setup”
  7. Hit “Commit”. You should see a message about one file changing.

Hopefully that all worked and nothing caught fire.

(Optional) Step 8: Setup SSH key-pairs

If you get tired of having to enter your github account and password each time you push changes (and are feel adventurous), you can setup what’s called an SSH key-pair with github that will allow you to connect securely. It’s a two step process:

  1. Generate a key pair
  2. Add the key to your github account

All of the “Bash” commands shown in these tutorials can be entered using the Shell in RStudio (Tools menu).

Anatomy of a Shiny App

Now that we’ve got local projects setup on each of your accounts, and GitHub repositories for you to use to collaborate, let’s create the two required files for a full-fledged Shiny web application.

From the File menu choose “New File” -> “R Script”. Enter this example code:

library(shiny)

# Define UI for application that draws a histogram
fluidPage(
  
  # Application title
  titlePanel("Old Faithful Geyser Data"),
  
  # Sidebar with a slider input for number of bins 
  sidebarLayout(
    sidebarPanel(
       sliderInput("bins",
                   "Number of bins:",
                   min = 1,
                   max = 50,
                   value = 30)
    ),
    
    # Show a plot of the generated distribution
    mainPanel(
       plotOutput("distPlot")
    )
  )
)

…and save it as “ui.R”. This code describes the layout of the input and output widgets on the webpage.

Create a new script file as above and enter this example code:

library(shiny)

# Define server logic required to draw a histogram
function(input, output) {
   
  output$distPlot <- renderPlot({
    
    # generate bins based on input$bins from ui.R
    x    <- faithful[, 2] 
    bins <- seq(min(x), max(x), length.out = input$bins + 1)
    
    # draw the histogram with the specified number of bins
    hist(x, breaks = bins, col = 'darkgray', border = 'white')
    
  })
  
}

…and save it as “server.R”. This code connects inputs to outputs.

You’ll see we need a bit more structure to describe standalone Shiny apps than we did when we were embedding Shiny widgets in R Markdown documents. In particular, we need to explicitly connect render* functions to specific outputs in the UI.

Notice that:

  • In ui.R sliderInput('bins', ...) hooks this slider up to input$bins in server.R
  • In ui.R plotOutput("distPlot") hooks this plot in the layout up to output$distPlot in server.R

To get ideas for what you might want to include in your Shiny apps:

  • See the Layout section of the Shiny Gallery.
  • The Articles section of the Shiny website has extensive documentation of all Shiny features. Check out the “Extend Shiny” sidebar for links to packages that add to Shiny; ShinyDashboard is particularly nice for complex layouts.
  • Find example Shiny apps that I’ve built for classes or that students have built in the past.

Git & GitHub Workflow

We’ll walk through the steps of making changes to your apps in R Studio, staging and commiting those changes, and then pushing your changes up to GitHub to share with your group.

This initial pass is going to be a little bit out-of-order because there isn’t anything new to pull from GitHub yet; we’ll go over the normal order for your workflow at the end.

Make some changes, stage files

Let’s make some example changes to your app files. Assign each person in your group to do one of the following:

  1. Add a new actionButton after the sliderInput in ui.R
  2. Change the max number of bins to 100 for the sliderInput in ui.R
  3. (if there are three of you) Add a new comment line starting with # somewhere in ui.R

On your Git panel, you should all see “?” next to the new files we created above. That’s because we haven’t yet told Git to track these files. Click the checkbox under “Staged” to add them to the list of files that Git tracks. The “?” became an “A” for Add. Also check “Staged” for any other files that have been changed.

When you hit “Commit” and then select “ui.R” you’ll each see the change you made highlighted.

Push your changes to GitHub

Commiting changes only creates a local “save point” in your code. If you want to share all of your changes (the series of commits you’ve made) you’ll need to Push them to GitHub. Note the message at the top of the “Git” panel in RStudio: you’ve made two local commits that makes your local repository (branch) be “ahead” of the GitHub repository “origin/master” by 2 commits.

Pick one person in your group to go first. Hit the “Push” button in the Commit window or on the “Git” pane.

Pick the second person in your group to go; Hit “Push”.

You’ll see a message like:

To https://github.com/WL-Biol185-ShinyProjects/sample-project.git
 ! [rejected]        master -> master (fetch first)
error: failed to push some refs to 'https://github.com/WL-Biol185-ShinyProjects/sample-project.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

If you take a moment to digest this somewhat cryptic message, you’ll see that the problem is that person #1 pushed changes to the project that you don’t have. Rather than erasing their work and replacing it with yours Git is going to help you merge the two versions (“branches”) of the project.

Following the message’s advice, you should click “Pull” to download person #1’s files.

Depending on how you each edited your files one of two things might happen:

  1. Automerge succeeds without any help from you; this works when Git’s merging algorithm can be pretty sure it knows how to merge the files without loosing anything.

  2. You see a message like this:

From https://github.com/WL-Biol185-ShinyProjects/sample-project
   bcba6e0..d1b8b44  master     -> origin/master
Auto-merging ui.R
CONFLICT (content): Merge conflict in ui.R
Automatic merge failed; fix conflicts and then commit the result.

This message is telling us that Git’s merge algorithm can’t automatically merge your changes with the changes on GitHub, so a human (you) is going to have to get involved. If this happens you’ll also notice big orange “U”’s next to the afflicted files on the “Git” pane.

When you open up a file where there’s a conflict from a merge, you’ll see something like this at each place in the file where you need to make an edit:

<<<<<<< HEAD
                   max = 100,
                   value = 30), sliderInput("blarg"),
=======
                   max = 300,
                   value = 30)
>>>>>>> d1b8b44153b3817104302a5d04458f25e61260bd

Obviously the stuff git just added isn’t valid R code! That’s a good thing. Your local copy of your app won’t run until you’ve attended to fixing all of the conflicts you need to.

Your version of the code is above the ======= and their version of the code is below it. You’ll need to pick which one you want or write a new block that integrates the two. You obviously also need to delete the <<<<<<< HEAD, =======, and >>>>>>> bigscarynumber lines.

Do you believe me now that it will be important for everyone in your group to write well formatted, easy to read, code?

Once you’ve fixed the conflicts, stage the files, make a new comit (with a message like “fixed your stupid conflicts, person #1”), and then you should be able to push!

Real work-flow

So, you’ve probably guess that it’s a good idea to start with a Pull, to get your group member’s changes from GitHub, before you make modifications. So the pattern that you’ll generally want to follow is:

  1. Pull
  2. Change stuff
  3. Stage changed files
  4. Commit changes
  5. Repeat steps 3-5 for every “task”; write meaningful messages. You’ll thank yourself for be diciplined about this!
  6. Once a set of tasks is “complete” Push commits back to GitHub.

In general, you shouldn’t Push a set of changes that leaves your project in a broken state. Finish the feature/bug fix/major task that you’re working on before commiting.

Publishing to the Shiny Server

When you’re ready to publish a live version of your app in a publically available location, you can run:

bio185::publishApp("PROJECTNAME")

That will fetch the current version of your code from GitHub and update the app linked from our course project’s page:

https://rna.wlu.edu/bio185/projects-page.html

Avoiding Conflicts (with code & people)

So fixing merge conflicts is annoying, but in a good way: you won’t accidentally throw out someone else’s work. It’s also less annoying than having to hand integrate lots of people’s edits to a Word document, because Git merge is generally smart enough to only make you change what can’t be fixed automatically.

Fortunately, there are some simple things you can do to avoid lots of merge conflicts, that also happen to be good project management practices for coding on a team (which is nearly always the case in the “real world”“).

Clear file structure design (ahead of time)

Git can usually automerge if changes occure in two different files or in two different places within a file. So if you take the time with your group to map out the structure of your project, in terms of what features will be implemented in which files, you can save yourselves a lot of headaches. Here are a couple of practical examples.

Within a script file: Map out your goals using comments. For example, my ui.R script above might have started it’s life looking like this:

# Load the packages we'll need

# Define UI for an application that draws a histogram
  
  # Application title
  
  # Sidebar with a slider input for number of bins 

  # Show a plot of the generated distribution

No need to write any implementation code; just start by sketching out your goals.

Use multiple script files: you aren’t limited to putting ALL of your code in ui.R and server.R; in fact on big projects that’d be a terrible idea! You can use the source() function to load any script file from another.

Consider this refactoring of our ui.R script:

library(shiny)

# Define UI for application that draws a histogram
fluidPage(
  
  # Application title
  titlePanel("Old Faithful Geyser Data"),
  
  # Sidebar with a slider input for number of bins 
  sidebarLayout(
    sidebarPanel(
       sliderInput("bins",
                   "Number of bins:",
                   min = 1,
                   max = 300,
                   value = 30)
    ),
    
    # Show a plot of the generated distribution
       mainPanel(
       plotOutput("distPlot")
    )
  )
)

Here we’ll split out the code that makes the major UI elements (sidebar and main panel) into their own script files:

sidebar.R:

library(shiny)

sidebar <- sidebarPanel(
             sliderInput("bins",
                         "Number of bins:",
                         min = 1,
                         max = 300,
                         value = 30)
           )

main-panel.R:

library(shiny)

main_panel <-  mainPanel(plotOutput("distPlot"))

ui.R:

library(shiny)
source("sidebar.R")
source("main-panel.R")

# Define UI for application that draws a histogram
fluidPage(
  
  # Application title
  titlePanel("Old Faithful Geyser Data"),
  
  # Sidebar with a slider input for number of bins 
  sidebarLayout(sidebar, main_panel)
  
)

Clear goals; clear ownership

In a couple of weeks your group will pitch your “big picture” features for the Shiny app that you’re going to build (source data, visualizations, analyses, etc.). The basic workflow on a software project that takes a feature from dream to reality is:

  1. Design it: you need to know exactly what you’re trying to build before you can start coding
  2. Implement it: write the code you think will work
  3. Fix all of the bugs
  4. (most of the time) Realize your design is flawed/incomplete/impossible and go back to step 1.

Each of these steps can, in turn, be broken down into a set of tasks. On a collaborative project, it’s essential that everyone own their own set of tasks. We’re going to use a tool on GitHub to help us keep track of tasks on these projects: Issues.

Open up your group’s repository page on GitHub and click on the “Issues” tab. Create a new Issue like “Project Design Goals.” Assign everyone in your group to it.

You’ll see that issues give you a nice conversation thread. Use them to track each persons’s set of tasks on the project and bugs that you find in other people’s tasks. These text blocks support markdown markup, let you reference commits, insert code, make check lists and also use twitter style “mentions.” For example, if you’d like to get my attention you can add "@whitwort" to an issue comment.

Once you’ve finished with a task/fixed a bug/etc. you can “Close issue”. The record of that work will remain, but not be shown by default. I’ll use your commit history and participation in the discussion of Issues as evidence of your contribution to your group’s project when grading at the end of the term.

Project Requirements

Data

As I’ve mentioned in class, we’re looking for data sets in the neighborhood of 10^5 - 10^9 data points. If there are lots of variables for each observation that might mean thousands of observations; if there are few, perhaps millions. We want big(ish) data that let you flex your Data Scientist muscles, but that are still small enough to fit into memory.

You must document the source for your data in your “README.md” file. If the license on the data set allows you to redistribute it, you should include it in your repository (as always, I recommend a data/ subfolder to keep things tidy). If not you’ll need to provide information about how you downloaded it (URL, etc).

You must also document all data wrangling and tidying steps that aren’t part of your Shiny app in (a) RMarkdown document(s). Include both the .Rmd source files and .html (knit) files in your repository. If you’d like to include that html in your web app itself, see the Shiny includeHTML function.

Motivation

At a minimum your “README.md” file should also document why you choose to focus on the data you did, why we should care about it, and why your app is awesome. It’d be great if you also included this text in your app itself.

Ambition

After your project pitches next week, I’ll offer advice to help your group calibrate goals. But err on the side of ambitious. You can always cut features that aren’t going to work out down the road. I’d like to hear what you’re optimistic you can include.

Final product

It must run by 5pm on Friday of Finals week (as published on the RNA Shiny server).

It’d be nice to include a link to the projects hosting page on RNA in your README.md:

https://rna.wlu.edu/bio185/projects-page.html