To update your project with the data and `.rmd`

file for this exercise, run:

bio297::start()

To update your project with the data and `.rmd`

file for this exercise, run:

bio297::start()

Download the file `johnsonlab.xlsx`

from the `data/`

subfolder and take a quick look at it in Excel.

I've already installed `readxl`

for you, but if I hadn't you could install this package with:

install.packages("readxl")

To load the functions in `readxl`

into our current session, we'll use `library`

:

library(readxl)

At the console you can use:

ls("package:readxl")

## [1] "excel_sheets" "read_excel"

The `read_excel`

function seems simple enough! Let's use it to load data from the first sheet:

jd <- read_excel("data/johnsonlab.xlsx")

Why didn't we have to specify a `sheet`

argument above?

View(jd)

- The data is tidy: each variable has a column, each observation a row
- Variable names are formatted consistently
- Variable names start with letters and have no spaces

These data are also a nice mix of variable types. We have:

- Continuous data:
`VacuumTime`

- Discrete data:
`VacuumUnderstanding`

- Categorical data:
`Gender`

class(jd$VacuumTime)

## [1] "numeric"

class(jd$VacuumUnderstanding)

## [1] "numeric"

Now let's check on a categorical variable:

class(jd$Gender)

## [1] "numeric"

- Use
`character`

to hold arbitrary text: for example codon sequences - Use
`factor`

to hold true categorical variables (Gender) with defined levels (male, female).

gender <- factor( c("male", "female") ) gender

## [1] male female ## Levels: female male

levels(gender)

## [1] "female" "male"

You can index factors just like any other type of vector:

gender[ c(1, 1, 2, 2, 1) ]

## [1] male male female female male ## Levels: female male

We can replace the current numeric column with a factor:

jd$Gender <- gender[ jd$Gender ]

Check to make sure that worked and verify that you understand why it did!

# Enter your code here!

You can call `summary`

on an entire data frame:

summary(jd)

Or just one vector:

summary(jd$VacuumTime)

## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.519 3.286 4.384 6.092 6.696 95.030

summary(jd$Gender)

## female male ## 118 58

Just like `summary`

, the plotting functions like `plot`

and `boxplot`

are also a little bit magical in R.

We can plot one numeric column:

plot(jd$VacuumTime)

Or make a scatter plot with two:

plot(jd$VacuumTime, jd$VelcroTime)

So that last scatter plot in formula syntax would be:

plot(VelcroTime ~ VacuumTime, data = jd)

boxplot(Age ~ Gender, data = jd)

Use `plot`

and `boxplot`

to explore several other interactions in this data set!

# Enter your code here!

- Finish this exercise (fill in all of the "# Enter your code here!" blocks). Check for errors by clicking on "Knit HTML" and looking over the document.
- When you're ready, use
`bio297::submit("03-tidy-data-1.rmd")`

to submit the assignment. - Read Wickham 2014 (in "Resources", "Literature" on Sakai) for next class.