jdFact <- read.table("data/jd-factorized.txt") jdTidy <- read.table("data/jd-tidy.txt")

jdFact <- read.table("data/jd-factorized.txt") jdTidy <- read.table("data/jd-tidy.txt")

A t-test is any statistical hypothesis test in which the test statistic follows a Student's t-distribution if the null hypothesis is supported. It can be used to determine if two sets of data are significantly different from each other, and is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known.

?TDist

For those who have never looked at the Students t distribution of values, we can plot a sample:

plot( dt(-10:10, 1) )

Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as "variation" among and between groups), developed by statistician and evolutionary biologist Ronald Fisher.

Fire up the ANOVA Playground app and we'll talk about how these tests work.

`A ~ B`

Average task time (`ObjectAve`

) is a function of task complexity (`Complexity`

)

We could use this formula syntax in a call to `t.test`

:

t.test(ObjectAve ~ Complexity, data = jdTidy)

## ## Welch Two Sample t-test ## ## data: ObjectAve by Complexity ## t = -4.4255, df = 337.69, p-value = 1.3e-05 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -1.0053871 -0.3866584 ## sample estimates: ## mean in group complex mean in group simple ## 4.397727 5.093750

boxplot(ObjectAve ~ Complexity, data = jdTidy)

test <- aov(ObjectAve ~ Complexity, data = jdTidy) summary(test)

## Df Sum Sq Mean Sq F value Pr(>F) ## Complexity 1 42.6 42.63 19.59 1.29e-05 *** ## Residuals 350 761.9 2.18 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(test)

The `plot`

function is magic!

summary(aov(ObjectAve ~ Education, data = jdTidy))

## Df Sum Sq Mean Sq F value Pr(>F) ## Education 6 13.6 2.264 0.994 0.429 ## Residuals 343 780.7 2.276 ## 2 observations deleted due to missingness

boxplot(ObjectAve ~ Education, data = jdTidy)

summary(aov(ObjectAve ~ Complexity + Gender, data = jdTidy))

## Df Sum Sq Mean Sq F value Pr(>F) ## Complexity 1 42.6 42.63 19.953 1.07e-05 *** ## Gender 1 16.2 16.20 7.582 0.0062 ** ## Residuals 349 745.7 2.14 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(aov(ObjectAve ~ Complexity:Gender, data = jdTidy))

## Df Sum Sq Mean Sq F value Pr(>F) ## Complexity:Gender 3 62.7 20.903 9.807 3.17e-06 *** ## Residuals 348 741.8 2.132 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here's the same along side the previous models:

summary(aov(ObjectAve ~ Complexity + Gender + Complexity:Gender, data = jdTidy))

## Df Sum Sq Mean Sq F value Pr(>F) ## Complexity 1 42.6 42.63 20.00 1.05e-05 *** ## Gender 1 16.2 16.20 7.60 0.00615 ** ## Complexity:Gender 1 3.9 3.88 1.82 0.17821 ## Residuals 348 741.8 2.13 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Because this is a common formulation, you can use a `*`

to do the same:

summary(aov(ObjectAve ~ Complexity * Gender, data = jdTidy))

## Df Sum Sq Mean Sq F value Pr(>F) ## Complexity 1 42.6 42.63 20.00 1.05e-05 *** ## Gender 1 16.2 16.20 7.60 0.00615 ** ## Complexity:Gender 1 3.9 3.88 1.82 0.17821 ## Residuals 348 741.8 2.13 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let's look at a picture to see what's going on there:

boxplot(ObjectAve ~ Complexity * Gender, data = jdTidy)

See the help page for `formula`

for additional syntax that can be used in formula expressions.

# Enter your code here!

plot(VelcroTime ~ VacuumTime, data = jdFact)

fit <- lm(VelcroTime ~ VacuumTime, data = jdFact) fit

## ## Call: ## lm(formula = VelcroTime ~ VacuumTime, data = jdFact) ## ## Coefficients: ## (Intercept) VacuumTime ## 3.88202 0.03299

cf <- coefficients(fit) cf

## (Intercept) VacuumTime ## 3.88201636 0.03298668

summary(fit)

## ## Call: ## lm(formula = VelcroTime ~ VacuumTime, data = jdFact) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.287 -1.931 -1.090 0.359 36.561 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.88202 0.42088 9.224 <2e-16 *** ## VacuumTime 0.03299 0.04239 0.778 0.437 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.409 on 174 degrees of freedom ## Multiple R-squared: 0.003469, Adjusted R-squared: -0.002258 ## F-statistic: 0.6057 on 1 and 174 DF, p-value: 0.4375

We can plot the fitted model on top of our scatter plot using `abline`

:

plot(VelcroTime ~ VacuumTime, data = jdFact) abline(reg = fit)

lm(VelcroTime ~ log(VacuumTime), data = jdFact)

## ## Call: ## lm(formula = VelcroTime ~ log(VacuumTime), data = jdFact) ## ## Coefficients: ## (Intercept) log(VacuumTime) ## 2.513 1.002

- Read Goffeau et al. 1996
- Make sure you have a working solution to the translation problem at the end of Working with Tables exercise (#2).