Assignment 03: data visualisation and exploration

Download the .Rmd file here.

To submit this assignment, upload the full document, including the original questions, your code, and the output. Submit your assignment as a knitted .pdf. Please ensure the text on your .pdf does not continue past the end of the page.

1. Visualising plant biomass (3.5 marks)

To start this assignment, you will once again be looking at the yearly change in biomass of plants in the beautiful Abisko national park in northern Sweden. We have pre-processed this data and made it available as a csv-file via this link. You can find the original data and a short abstract on dryad. The original study1 is available on an open access license.

plant_biomass <- 
  read_csv('https://uoftcoders.github.io/rcourse/data/plant-biomass-preprocess.csv',
           show_col_types = FALSE) %>%
  rename(dwarf_birch = betula_nana, 
         wavy_hair_grass = deschampsia_flexuosa,
         crowberry = empetrum_nigrum,
         bilberry = vaccinium_myrtillus,
         bog_bilberry = vaccinium_uliginosum,
         lingonberry = vaccinium_vitis_idaea) %>%
  pivot_longer(cols = dwarf_birch:lingonberry, names_to = "species",
               values_to = "biomass")

print(plant_biomass)
# A tibble: 1,080 × 6
    year  site habitat treatment       species         biomass
   <dbl> <dbl> <chr>   <chr>           <chr>             <dbl>
 1  1998     1 Forest  grazedcontrol   dwarf_birch        0   
 2  1998     1 Forest  grazedcontrol   wavy_hair_grass    7.47
 3  1998     1 Forest  grazedcontrol   crowberry         22.6 
 4  1998     1 Forest  grazedcontrol   bilberry         118.  
 5  1998     1 Forest  grazedcontrol   bog_bilberry      11.9 
 6  1998     1 Forest  grazedcontrol   lingonberry        8.46
 7  1998     1 Forest  rodentexclosure dwarf_birch        4.74
 8  1998     1 Forest  rodentexclosure wavy_hair_grass    3.32
 9  1998     1 Forest  rodentexclosure crowberry         24.4 
10  1998     1 Forest  rodentexclosure bilberry          57.0 
# ℹ 1,070 more rows
  1. Compare the mean biomass for grazedcontrol with that of rodentexclosure graphically in a line plot. What could explain the big dip in biomass year 2012? (0.75 marks) Hint: The published study might be able to help with the second question

  2. Compare the mean yearly change in biomass for each species in a lineplot. (0.5 marks)

  3. We’ve found that the biomass is higher in the sites with rodent exclosures (especially in recent years), and that the crowberry is the dominant species. Notice how the lines for rodentexclosure and crowberry are of similar shape. Coincidence? Let’s find out! Use a facetted line plot to explore whether all plant species are impacted equally by grazing. (0.75 mark)

  4. The habitat could also be affecting the biomass of different species. Explore this in a line plot of the mean biomass over time. (0.75 marks)

  5. Explore the relationship between species, habitat, and biomass in a box plot. (0.5 marks)

  6. It looks like both habitat and treatment have an effect on most of the species! Let’s dissect the data further by visualizing the effect of both the habitat and treatment on each species by faceting the plot accordingly. (0.75 mark)

2. Customising plots (1 mark)

  1. Create a ggplot theme object that makes five modifications to plot appearance. Save it this way so you can use it in other plots (including outside of this course). You can use the vignettes at the bottom of ggplot’s theme documentation to find arguments you might like to modify. Some components you might like to alter: panel.background to change the colour of the plot body (remember, colour alters plot element outlines and fill alters the body of plot elements), panel.grid to alter the plot grid, strip.text for facet box text, and legend.title to change the size of the text of the legend’s title. (0.5 mark)

  2. Apply this theme to your plot from 1f and add labels (a title, colour legend title, x axis lable, and y axis label). (0.5 marks)

3. Visualising mammal size dimorphism (3 marks)

Download the “SSDinMammals.csv” file from Quercus, the course website, or Data Dryad. The original study2 is quite interesting!

mammal_sizes <- read_csv("~/eeb313website/lectures/data/SSDinMammals.csv",
                         show_col_types = FALSE)
print(mammal_sizes)
# A tibble: 691 × 18
   Order      Family Species Scientific_Name massM SDmassM massF SDmassF lengthM
   <chr>      <chr>  <chr>   <chr>           <dbl>   <dbl> <dbl>   <dbl>   <dbl>
 1 Afrosoric… Chrys… Hotten… Amblysomus hot…  80.6    1.24  66      8.64      NA
 2 Afrosoric… Chrys… Namib … Eremitalpa gra…  28      6.7   23.1    3.6       NA
 3 Afrosoric… Tenre… Lesser… Echinops telfa… 102.    19.3   99.9   17.8       NA
 4 Afrosoric… Tenre… Large-… Geogale aurita    7.3    1.02   7      1.34      NA
 5 Afrosoric… Tenre… Highla… Hemicentetes n… 111     15.6   98      6.56      NA
 6 Afrosoric… Tenre… Lowlan… Hemicentetes s… 110.    16.7  108.    16.0       NA
 7 Afrosoric… Tenre… Short-… Microgale brev…   9.4    1.52   9.7    1.16      NA
 8 Afrosoric… Tenre… Cowan'… Microgale cowa…  12.4    1.71  12.7    1.83      NA
 9 Afrosoric… Tenre… Dobson… Microgale dobs…  26.9    4.38  28.9    5.39      NA
10 Afrosoric… Tenre… Drouha… Microgale drou…  10.1    1.26  12.5    1.79      NA
# ℹ 681 more rows
# ℹ 9 more variables: SDlengthM <dbl>, lengthF <dbl>, SDlengthF <dbl>,
#   n_M <dbl>, n_F <dbl>, n_Mlength <dbl>, n_Flength <dbl>, Comments <chr>,
#   Source <chr>
  1. Make a scatterplot with female mass on the x-axis and male mass on the y-axis. Plot a straight line with slope y = x (can use the abline geom) behind the mass data points. Make subplots according to Order, allowing free scaling of the axes. Apply the theme you made in question 2. (1 mark)

  2. Calculate the proportional mass difference ([massM - massF] / massM) for all the rows in the data set. Separating the data by Order, create a boxplot of these proportional mass differences. Plot a horizontal line behind the data, with a y-intercept that would reflect no mass difference between sexes. Apply your theme from question 2. Finally, ensure the Order text is legible. (1 mark)

  3. Improve your plot from 3b. Include points behind the boxplot that reflect the number of observations for each point (i.e. sizing by n_M or n_F). You should ensure these points are reasonably visible by using a variant of the point geom (look to the data exploration lecture’s section on boxplot augmentation) and specifying opacity. Visually differentiate the horizontal reference line from the data (using colour, for example). Label your axes and legend. (1 mark)


  1. Olofsson J, te Beest M, Ericson L (2013) Complex biotic interactions drive long-term vegetation dynamics in a subarctic ecosystem. Philosophical Transactions of the Royal Society B 368(1624): 20120486. https://dx.doi.org/10.1098/rstb.2012.0486↩︎

  2. Tombak, K.J., Hex, S.B.S.W. & Rubenstein, D.I. New estimates indicate that males are not larger than females in most mammal species. Nat Commun 15, 1872 (2024). https://doi.org/10.1038/s41467-024-45739-5↩︎