A note about table structure

Excel has probably trained you to format data something like this:

Day Group A Group B
1 5 5
2 6 7
3 7 9
4 8 11

The correct format for this design would be a three column table:

Day Group Response
1 A 5
1 B 5
2 A 6
2 B 7
3 A 7
3 B 9
4 A 8
4 B 11

Start the exercise

bio297::start()

Loading tabular data

To load the data in codons.txt into a variable we'll use the read.table function:

codons <- read.table( "data/codons.txt", 
                      header = TRUE, 
                      stringsAsFactors = FALSE
                    )
head(codons)
##   codon aminoAcid
## 1   GCU         A
## 2   GCC         A
## 3   GCA         A
## 4   GCG         A
## 5   CGU         R
## 6   CGC         R

Accessing data in a data.frame

codons$codon
##  [1] "GCU" "GCC" "GCA" "GCG" "CGU" "CGC" "CGA" "CGG" "AGA" "AGG" "AAU"
## [12] "AAC" "GAU" "GAC" "UGU" "UGC" "CAA" "CAG" "GAA" "GAG" "GGU" "GGC"
## [23] "GGA" "GGG" "CAU" "CAC" "AUU" "AUC" "AUA" "AUG" "UUA" "UUG" "CUU"
## [34] "CUC" "CUA" "CUG" "AAA" "AAG" "UUU" "UUC" "CCU" "CCC" "CCA" "CCG"
## [45] "UCU" "UCC" "UCA" "UCG" "AGU" "AGC" "ACU" "ACC" "ACA" "ACG" "UGG"
## [56] "UAU" "UAC" "GUU" "GUC" "GUA" "GUG" "UAA" "UGA" "UAG"

Your turn: in the code block below print out the contents of the aminoAcid variable.

# Enter your code here!

Remember length? Verify that I have given you a table with all 64 possible sequences:

# Enter your code here!

codons[ 1, 2 ]
## [1] "A"

Use indexing to print out the first three rows of codons (with both columns):

# Enter your code here!

Prove to yourself that you understand why this is true:

codons$codon == codons[ , "codon"]
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [15] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [29] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [43] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [57] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Calculating a new column

codons$type <- NA

Have a look at the codons data frame to see what happened:

# Enter your code here!

nonpolar  <- c( "A", "C", "G", "I", "L", "M", "F", "P", "W", "V" )
polar     <- c( "N", "Q", "S", "T", "Y"                          )
acidic    <- c( "D", "E"                                         )
basic     <- c( "R", "H", "K"                                    )
# Enter your code here!

For example, which rows contain nonpolar amino acids? Try this:

codons$aminoAcid %in% nonpolar
##  [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE
## [23]  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [34]  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## [56] FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

codons[ codons$aminoAcid %in% nonpolar, "type" ] <- "nonpolar"

Check to see if it worked!

Now annotate the other three groups of amino acids in your table:

# Enter your code here!

Adding row names

This function can display the current names:

row.names(codons)
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
## [15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28"
## [29] "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42"
## [43] "43" "44" "45" "46" "47" "48" "49" "50" "51" "52" "53" "54" "55" "56"
## [57] "57" "58" "59" "60" "61" "62" "63" "64"

How are our rows currently named?

Alternatively, you can use it to assign a vector of names with <-:

row.names(codons) <- codons$codon

codons["AUG", ]
##     codon aminoAcid     type
## AUG   AUG         M nonpolar

Translation

Here's a vector of codons that make up a short open reading frame (ORF):

ORF1 <- c("AUG", "GCA", "GGG", "AGC", "GUA", "UGC", "CUU", "UGA")

Use indexing syntax and your codons data frame to translate this ORF. It can be done in one line!

# Enter your code here!

If you want to get fancy, you might find a use for the paste function using the collapse argument…

After class

  1. Finish this exercise (fill in all of the "# Enter your code here!" blocks). Check for errors by clicking on "Knit HTML" and looking over the document.
  2. When you're ready, use bio297::submit("02-tables.rmd") to submit the assignment.
  3. Read Wickham 2014 (in "Resources", "Literature" on Sakai) for next class.