# Step 6: Background intensity

So far we’ve been ignoring background pixel intensities. These values are a measure of the brightness in the regions surround each spot (the “background”). Background is obviously noise and doesn’t correspond to cDNA’s binding specifically to probes on the array.

Produce a visualization to help us asses how variable the background intensities are across the surface of the array. One great way to do this would be to produce a heat map, where each box corresponds to a spot position on our array and the color corresponds to the intensity of the background at that spot.

This can be done with the image and heat.colors functions. Explore the way these functions work using some sample data. Think about the structure of the data you want to pass into image and how you can make a matrix conforming to this architecture using your existing data.frames.

Are the background intensities uniform or are some much higher than others? Where are the problems the greatest?

How should we use background intensity information?

# Step 8: Global median normalization

Knowing what you know now about your data, design a way to add a normalized ratio column to your data structure. Implement it, run it, and check to make sure it worked!

# Step 9: Collecting pre-processed data and integrating annotations

Up to this point we’ve been looking at our data one microarray table at a time. As we start exploring the data it would be more convenient to have a new table which contains just the set of final normalized ratio values for each array. Let’s list arrays across the columns and genes down the rows.

You also may have noticed that the array data files we downloaded from GEO don’t tell us which row is associated with which yeast gene – instead it gives us an ID_REF column. The IDs in this column match up with the IDs in the platform table (see step #1). Since we’re building a new table at this point it makes sense to pull in the gene annotations at this point as well from the platform record.

Implement a function to do it! Your function will probably need to take multiple source array data tables, know which column to pull data from and take a platform table.

# Step 10: Exploring the data set

Once you have a data.frame will all of the ratio data from each microarray loaded and annotated, it’s time to start exploring the data set.

Here are some challenges to try:

• Produce a list of genes which showed expression level changes > 2x.

• Produce a hierarchical cluster. You’ll want to explore the heatmap function. Note, the heatmap function wants you to pass in data as a matrix (rather than a data.frame). You can convert your data to a matrix with the as.matrix function. Think carefully about what you want to do about missing data…

• Experiment with clustering only the rows (genes) or only the columns (time points). You can also perform clustering alone using the dist and hclust functions (explore the output of these functions).

• Use the cutree function to “trim” your gene dendrogram into a set of significant groupings using k-means clustering (see also, kmeans).

• Use the RowSideColors argument to the heatmap function to highlight genes that share a common GO annotation (the easiest way to pull in GO annotations is from the “slim” set at SGD: