Interacting with R

What do you think will happen when you enter these commands? Try it out.

1 + 2
2 * 3
4 ^ 5
6.7 / 8.9

Getting help

To get the help page for the c():

## ?c
## help(c)

Saving data in variables

a <- 10
a
##  10
b <- a + 11
b
##  21
c <- a / b
c
##  0.4762

If you're working with very large numbers you can use for scientific notation:

2e10
##  2e+10
2 * 10^10
##  2e+10
2e10 == 2 * 10^10
##  TRUE

Everything is a vector

You can see how many elements a vector holds using the length function:

length(10)
##  1
length(c)
##  1
length(1:10)
##  10

Composing vectors

c(1,2,3,4)
##  1 2 3 4
d <- c(5,6,7,8)
d + 10
##  15 16 17 18
d + d
##  10 12 14 16

Strings

To create strings, surround your text with either double " ... " or single ' ... ' quotes:

"a"
##  "a"
"a" == 'a'
##  TRUE
c( "a", "b", "c", "d" )
##  "a" "b" "c" "d"

Escape characters

s <- "My data are \"awesome\"!"
cat(s)
## My data are "awesome"!

Two other special string characters are tab \t and newline \n:

s <- "a\tb\tc"
cat(s)
## a    b   c
s <- "a\nb\nc"
cat(s)
## a
## b
## c

Boolean values

To create boolean values use TRUE or FALSE:

TRUE
##  TRUE
FALSE
##  FALSE
TRUE == FALSE
##  FALSE

Missing Data

c(1, 2, NA, 4)
##   1  2 NA  4
c( "a", NA, "c", NA )
##  "a" NA  "c" NA
c(TRUE, FALSE, NA, FALSE)
##   TRUE FALSE    NA FALSE
is.na( c(1, NA) )
##  FALSE  TRUE

A note about NULL

NULL is used to signify unassigned variables:

NULL
is.null(NULL)
##  TRUE

Extracting values

Let's say we have a vector of numbers:

myNumbers <- c( 10, 20, 30, 40, 50 )

We can extract elements from 1D vectors using the index syntax [] and integers:

myNumbers
##  10 20 30 40 50
myNumbers
##  10
myNumbers
##  30

We can use integer vectors with more than one element inside of our index [...]'s::

myNumbers[ c(1, 3) ]
##  10 30

You can use the : operator to easily create a sequence of numbers:

2:4
##  2 3 4
myNumbers[2:4]
##  20 30 40
myNumbers
##  10 20 30 40 50
myNumbers[ c(FALSE, TRUE , TRUE , TRUE , TRUE ) ]
##  20 30 40 50
myNumbers[ c(TRUE , FALSE, FALSE, FALSE, FALSE) ]
##  10

Logical operators always return a logical vector:

myNumbers > 25
##  FALSE FALSE  TRUE  TRUE  TRUE
myNumbers < 25
##   TRUE  TRUE FALSE FALSE FALSE
myNumbers == 30
##  FALSE FALSE  TRUE FALSE FALSE
myNumbers != 30
##   TRUE  TRUE FALSE  TRUE  TRUE

The %in% asks if the first set of numbers can be found in the second:

30 %in% myNumbers
##  TRUE
c(10, 100) %in% myNumbers
##   TRUE FALSE

The ! operator negates (flips) each value of a logical vector:

!TRUE
##  FALSE
!(myNumbers > 25)
##   TRUE  TRUE FALSE FALSE FALSE

So how can we combine logical comparisons with indexing?

myNumbers[myNumbers > 25]
##  30 40 50
myNumbers[myNumbers < 25]
##  10 20

You can get fancy…

myNumbers[ (myNumbers %% 2) == 0 ]
##  10 20 30 40 50

Assigning values

myNumbers
##  10 20 30 40 50
myNumbers    <- 100
myNumbers
##   10  20 100  40  50
myNumbers[2:3]  <- c(1,2)
myNumbers
##  10  1  2 40 50

Matrix

A matrix is a vector of vectors, each the same length and with the same type of data:

m <- matrix(1:8, nrow = 2, ncol = 4)
m
##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    7
## [2,]    2    4    6    8

You access values on a matrix by using a one element index, refering to a n'th position:

m
##  2

Alternatively you can specify a [row, col]:

m[1,2]
##  3

Or just a row:

m[1,]
##  1 3 5 7

Or just a column:

m[,2]
##  3 4

If you forget this syntax, just pay attention to how R prints out matrixes!

Array

An array is a matrix of more than two dimensions.

array(1:8, dim=c(2,2,2))
## , , 1
##
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
##
## , , 2
##
##      [,1] [,2]
## [1,]    5    7
## [2,]    6    8

Lists

l <- list( a = c(1, 2, 3, 4)
, b = c("a", "b", "c")
)
l
## $a ##  1 2 3 4 ## ##$b
##  "a" "b" "c"

You can access individual vectors on lists using indexing with numbers or names:

l
## $a ##  1 2 3 4 l["a"] ##$a
##  1 2 3 4

Did you notice what type of thing was returned there?

To simplify the result of indexing down to a vector (rather than a one element list):

l[]
##  1 2 3 4

The $is short hand for referencing a named element on a list: l$a
##  1 2 3 4

A note about table structure

Excel has probably trained you to format data something like this:

Day Group A Group B
1 5 5
2 6 7
3 7 9
4 8 11

The correct design would be a three column table:

Day Group Response
1 A 5
1 B 5
2 A 6
2 B 7
3 A 7
3 B 9
4 A 8
4 B 11

codons <- read.table( "data/codons.txt"
, header = TRUE
, stringsAsFactors = FALSE
)
##   codon aminoAcid
## 1   GCU         A
## 2   GCC         A
## 3   GCA         A
## 4   GCG         A
## 5   CGU         R
## 6   CGC         R

Accessing data in a data.frame

codons$codon ##  "GCU" "GCC" "GCA" "GCG" "CGU" "CGC" "CGA" "CGG" "AGA" "AGG" "AAU" ##  "AAC" "GAU" "GAC" "UGU" "UGC" "CAA" "CAG" "GAA" "GAG" "GGU" "GGC" ##  "GGA" "GGG" "CAU" "CAC" "AUU" "AUC" "AUA" "AUG" "UUA" "UUG" "CUU" ##  "CUC" "CUA" "CUG" "AAA" "AAG" "UUU" "UUC" "CCU" "CCC" "CCA" "CCG" ##  "UCU" "UCC" "UCA" "UCG" "AGU" "AGC" "ACU" "ACC" "ACA" "ACG" "UGG" ##  "UAU" "UAC" "GUU" "GUC" "GUA" "GUG" "UAA" "UGA" "UAG" codons$aminoAcid
##   "A" "A" "A" "A" "R" "R" "R" "R" "R" "R" "N" "N" "D" "D" "C" "C" "Q"
##  "Q" "E" "E" "G" "G" "G" "G" "H" "H" "I" "I" "I" "M" "L" "L" "L" "L"
##  "L" "L" "K" "K" "F" "F" "P" "P" "P" "P" "S" "S" "S" "S" "S" "S" "T"
##  "T" "T" "T" "W" "Y" "Y" "V" "V" "V" "V" "X" "X" "X"
codons[ 1, 2 ]
##  "A"
codons[ 2, 1 ]
##  "GCC"

Calculating a new column

We can start by creating a new column called type that contains all NA values:

codons$type <- NA head(codons) ## codon aminoAcid type ## 1 GCU A NA ## 2 GCC A NA ## 3 GCA A NA ## 4 GCG A NA ## 5 CGU R NA ## 6 CGC R NA nonpolar <- c( "A", "C", "G", "I", "L", "M", "F", "P", "W", "V" ) polar <- c( "N", "Q", "S", "T", "Y" ) acidic <- c( "D", "E" ) basic <- c( "R", "H", "K" ) length( c( nonpolar, polar, acidic, basic ) ) == 20 ##  TRUE Which rows contain nonpolar amino acids? codons$aminoAcid %in% nonpolar
##    TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE
##   TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##   TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
##  FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

We can use these logical vectors to assign our type annotations:

codons[ codons$aminoAcid %in% nonpolar, "type" ] <- "nonpolar" codons[ codons$aminoAcid %in% polar,    "type" ] <- "polar"
codons[ codons$aminoAcid %in% acidic, "type" ] <- "acidic" codons[ codons$aminoAcid %in% basic,    "type" ] <- "basic"

Check to see if it worked!

Implement transcription and translation

The challenge – design an R script with a set of functions that will:

• Read the DNA sequence in the file data/npl3-dna.txt"
• Transcribe it into RNA sequence
• Save the results to a new file, rna.txt

And the special challenge:

• Translate the DNA sequence to protein (using the "data/codons.txt")

Parts list:

• readLines() hint con = "dna.txt"
• gsub() hint fixed = true
• write()