class: title-slide, center, bottom background-image: url(https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/static/tt_logo.png) background-size: 1000px background-position: 50% 10% ### Joe Roberts #### Harper Adams R Users Group · 2020-04-25 --- class: inverse, right, bottom <img style="border-radius: 50%;" src="https://github.com/dr-joe-roberts.png" width="200px"/> ## Where to find me .medium[ [
Dr-Joe-Roberts](https://github.com/Dr-Joe-Roberts) [
Dr_Joe_Roberts](https://twitter.com/Dr_Joe_Roberts) [
jroberts@harper-adams.ac.uk](mailto:jroberts@harper-adams.ac.uk) ] --- class: left, middle # What's the plan for today? š©āš» .medium[ - Re-visit 'tidy' data concept and Tidy Tuesday - Look at a real Tidy Tuesday data set - Data visualisation ] --- class: left, top background-image: url(https://d33wubrfki0l68.cloudfront.net/6f1ddb544fc5c69a2478e444ab8112fb0eea23f8/91adc/images/tidy-1.png) background-size: 1200px background-position: 50% 90% # Tidy data - recap š§¹ .medium[ - Data can be presented in many different formats - Tidy data is usually the easiest format to use with R - Core principle for the `tidyverse` ecosystem ] --- class: left, top background-image: url(https://d33wubrfki0l68.cloudfront.net/6f1ddb544fc5c69a2478e444ab8112fb0eea23f8/91adc/images/tidy-1.png) background-size: 1200px background-position: 50% 90% # Activity 1 (10 mins) ā .medium[ - I am about to share a .xlsx file via MS Teams chat - Turn this file into a tidy data set using the 3 rules below - **Hint**: data dictionaries should be in a separate tab ] --- class: left, top # Activity 1 - my thoughts š¤ ```r head(aphid_data, 18) ``` ``` ## sample_ID sample species rt peak_area concentration ## 1 CG19-14 Plant Brevicoryne 16.247 154123 9.4 ## 2 CG19-23 Plant Brevicoryne 16.240 77839 4.7 ## 3 CG19-24 Plant Brevicoryne 16.243 204238 12.4 ## 4 CG19-09 Plant Myzus 16.241 4783237 290.9 ## 5 CG19-13 Plant Myzus 16.245 2653330 161.4 ## 6 CG19-26 Plant Myzus 16.240 3224152 196.1 ## 7 CG19-28 AD+SA Brevicoryne 16.240 295414 17.9 ## 8 CG19-31 AD+SA Brevicoryne 16.240 298222 18.1 ## 9 CG19-34 AD+SA Brevicoryne 16.242 192785 11.7 ## 10 CG19-30 AD-SA Brevicoryne NA 0 0.0 ## 11 CG19-32 AD-SA Brevicoryne 16.238 33358 2.0 ## 12 CG19-33 AD-SA Brevicoryne 16.237 34018 2.0 ## 13 CG19-15 AD+SA Myzus 16.243 4034920 245.5 ## 14 CG19-18 AD+SA Myzus 16.242 2442438 148.6 ## 15 CG19-22 AD+SA Myzus 16.242 5650064 343.7 ## 16 CG19-17 AD-SA Myzus 16.242 51450 3.1 ## 17 CG19-20 AD-SA Myzus 16.242 846354 51.5 ## 18 CG19-21 AD-SA Myzus 16.244 110176 6.7 ``` --- class: left, middle # What is Tidy Tuesday? > A weekly data project aimed at the R ecosystem. As this project was borne out of the R4DS Online Learning Community and the **R for Data Science** textbook, an emphasis was placed on understanding how to summarize and arrange data to make meaningful charts with `ggplot2`, `tidyr`, `dplyr`, and other tools in the `tidyverse` ecosystem. <br> .medium[ - Data sets are uploaded to the Tidy Tuesday [GitHub](https://github.com/rfordatascience/tidytuesday) - Uploads contain data background, download instructions and a data dictionary - Data are not always tidy - some wrangling is usually required! - Let's look at a Tidy Tuesday data set... ] --- class: left, middle # Food consumption and CO2 emissions > This data on annual CO2 emissions per person for 130 nations worldwide was originally compiled by the Food and Agriculture Organization of the United Nations (FAO) in 2018. On 2020-02-18 it was uploaded as a .csv file to the Tidy Tuesday [GitHub repository](https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-02-18/readme.md). --- class: left, middle # What does this data set contain? -- ```r glimpse(food_consumption) # Similar to print in base R - great to look at data ``` ``` ## Rows: 1,430 ## Columns: 4 ## $ country <chr> "Argentina", "Argentina", "Argā¦ ## $ food_category <chr> "Pork", "Poultry", "Beef", "Laā¦ ## $ consumption <dbl> 10.51, 38.66, 55.48, 1.56, 4.3ā¦ ## $ co2_emmission <dbl> 37.20, 41.53, 1712.00, 54.63, ā¦ ``` -- |Variable |Class |Description | |:-------------|:---------|:-----------| |country |character | Country Name | |food_category |character | Food Category | |consumption |double | Consumption (kg/person/year) | |co2_emmission |double | CO2 Emission (Kg CO2/person/year) | --- class: left, middle # Activity 2 (20 mins) ā .medium[ - Let's start with a question - **which food category has the highest CO2 emissions?** - We can answer this by visualising the data! ] -- .medium[ - I want you to recreate the following plot where I have done exactly this - I will post the link to the data in the MS Teams chat ] --- class: center, middle <img src="index_files/figure-html/food-type-emissions-1.svg" height="600px" /> --- count: false .pull-left-reveal[ ```r *food_consumption ``` ] .pull-right-reveal[ ``` ## # A tibble: 1,430 x 4 ## country food_category consumption co2_emmission ## <chr> <chr> <dbl> <dbl> ## 1 Argentiā¦ Pork 10.5 37.2 ## 2 Argentiā¦ Poultry 38.7 41.5 ## 3 Argentiā¦ Beef 55.5 1712 ## 4 Argentiā¦ Lamb & Goat 1.56 54.6 ## 5 Argentiā¦ Fish 4.36 6.96 ## 6 Argentiā¦ Eggs 11.4 10.5 ## 7 Argentiā¦ Milk - inc. cheeā¦ 195. 278. ## 8 Argentiā¦ Wheat and Wheat ā¦ 103. 19.7 ## 9 Argentiā¦ Rice 8.77 11.2 ## 10 Argentiā¦ Soybeans 0 0 ## # ā¦ with 1,420 more rows ``` ] --- count: false .pull-left-reveal[ ```r food_consumption %>% * group_by(food_category) ``` ] .pull-right-reveal[ ``` ## # A tibble: 1,430 x 4 ## # Groups: food_category [11] ## country food_category consumption co2_emmission ## <chr> <chr> <dbl> <dbl> ## 1 Argentiā¦ Pork 10.5 37.2 ## 2 Argentiā¦ Poultry 38.7 41.5 ## 3 Argentiā¦ Beef 55.5 1712 ## 4 Argentiā¦ Lamb & Goat 1.56 54.6 ## 5 Argentiā¦ Fish 4.36 6.96 ## 6 Argentiā¦ Eggs 11.4 10.5 ## 7 Argentiā¦ Milk - inc. cheeā¦ 195. 278. ## 8 Argentiā¦ Wheat and Wheat ā¦ 103. 19.7 ## 9 Argentiā¦ Rice 8.77 11.2 ## 10 Argentiā¦ Soybeans 0 0 ## # ā¦ with 1,420 more rows ``` ] --- count: false .pull-left-reveal[ ```r food_consumption %>% group_by(food_category) %>% * summarise(total_co2 = sum(co2_emmission)) ``` ] .pull-right-reveal[ ``` ## # A tibble: 11 x 2 ## food_category total_co2 ## <chr> <dbl> ## 1 Beef 48633. ## 2 Eggs 975. ## 3 Fish 3588. ## 4 Lamb & Goat 11837. ## 5 Milk - inc. cheese 23290 ## 6 Nuts inc. Peanut Butter 952. ## 7 Pork 7419. ## 8 Poultry 2963. ## 9 Rice 4887. ## 10 Soybeans 50.4 ## 11 Wheat and Wheat Products 1774. ``` ] --- count: false .pull-left-reveal[ ```r food_consumption %>% group_by(food_category) %>% summarise(total_co2 = sum(co2_emmission)) %>% * ggplot(aes(x = food_category, y = total_co2)) ``` ] .pull-right-reveal[ ![](index_files/figure-html/output_food-type-emissions-plot_4-1.svg)<!-- --> ] --- count: false .pull-left-reveal[ ```r food_consumption %>% group_by(food_category) %>% summarise(total_co2 = sum(co2_emmission)) %>% ggplot(aes(x = food_category, y = total_co2)) + * geom_col() ``` ] .pull-right-reveal[ ![](index_files/figure-html/output_food-type-emissions-plot_5-1.svg)<!-- --> ] --- count: false .pull-left-reveal[ ```r food_consumption %>% group_by(food_category) %>% summarise(total_co2 = sum(co2_emmission)) %>% ggplot(aes(x = food_category, y = total_co2)) + geom_col() + * coord_flip() ``` ] .pull-right-reveal[ ![](index_files/figure-html/output_food-type-emissions-plot_6-1.svg)<!-- --> ] --- count: false .pull-left-reveal[ ```r food_consumption %>% group_by(food_category) %>% summarise(total_co2 = sum(co2_emmission)) %>% ggplot(aes(x = food_category, y = total_co2)) + geom_col() + coord_flip() + * aes(reorder(food_category, total_co2)) ``` ] .pull-right-reveal[ ![](index_files/figure-html/output_food-type-emissions-plot_7-1.svg)<!-- --> ] --- count: false .pull-left-reveal[ ```r food_consumption %>% group_by(food_category) %>% summarise(total_co2 = sum(co2_emmission)) %>% ggplot(aes(x = food_category, y = total_co2)) + geom_col() + coord_flip() + aes(reorder(food_category, total_co2)) + * theme_bw(base_size = 12) ``` ] .pull-right-reveal[ ![](index_files/figure-html/output_food-type-emissions-plot_8-1.svg)<!-- --> ] --- count: false .pull-left-reveal[ ```r food_consumption %>% group_by(food_category) %>% summarise(total_co2 = sum(co2_emmission)) %>% ggplot(aes(x = food_category, y = total_co2)) + geom_col() + coord_flip() + aes(reorder(food_category, total_co2)) + theme_bw(base_size = 12) + * labs(x = "", * y = "CO2 Emission (Kg CO2/person/year)", * title = "Which food category has the highest CO2 emimssions?") ``` ] .pull-right-reveal[ ![](index_files/figure-html/output_food-type-emissions-plot_11-1.svg)<!-- --> ] --- class: left, middle # My annotated code ``` # Data prep food_consumption %>% # Select the data frame group_by(food_category) %>% # Group this data frame by food_category variable summarise(total_co2 = sum(co2_emmission)) %>% # Create a new summary data frame # Plot data ggplot(aes(x = food_category, y = total_co2)) + # Plot the new data frame geom_col() + # Select plot type coord_flip() + # Rotate plot 90 degrees aes(reorder(food_category, total_co2)) + # Re-order food categories from lowest CO2 to highest theme_bw(base_size = 12) + # Select theme labs(x = "", # Set x-axis label to blank y = "CO2 Emission (Kg CO2/person/year)", # Set y-axis label title = "Which food category has the highest CO2 emimssions?") # Set plot title ``` --- class: left, middle # Next meeting (2020-04-29) .medium[ - We'll take a deeper dive into food consumption and CO2 emissions data set - Aim to answer some specific questions - Explore the data more with more visualisations - Basic linear regression modeling ] --- class: left, middle # I propose a challenge... .medium[ - Produce your own data visualisation from the food-CO2 data - Make it as complicated or as simple as you like - Think of a question you want answer using the data - Show at the next **HARUG** meeting (2020-04-29) ] --- class: left, middle # Useful resources for the challenge! #### š R for Data Science - [R for Data Science](https://r4ds.had.co.nz/) - Overview of most data science topics, with great tips #### šØ Visualization in R - [Data Visualization](https://socviz.co) - Communication oriented data visualization in R - [R Graphics Cookbook](https://r-graphics.org/) - Practical introduction to visualization with ggplot2