class: center, middle, inverse, title-slide .title[ # Comparing Categories - Refinements ] .author[ ### Maithreyi Gopalan ] .date[ ### Week 5 ] --- layout: true <script> feather.replace() </script> <div class="slides-footer"> <span> <a class = "footer-icon-link" href = "https://github.com/maithgopalan/c2-dataviz-2025/tree/main/static/slides/w5.pdf"> <i class = "footer-icon" data-feather="download"></i> </a> <a class = "footer-icon-link" href = "https://dataviz-win2025.netlify.app/slides/w5.html"> <i class = "footer-icon" data-feather="link"></i> </a> <a class = "footer-icon-link" href = "https://github.com/maithgopalan/c2-dataviz-2025"> <i class = "footer-icon" data-feather="github"></i> </a> </span> </div> --- class: inverse-blue # Data viz in Wild Saratessa Sophia ### Will and Maiko on deck --- class: inverse-red middle # Reviewing Lab PS-2 --- # Agenda * Visualization for comparing categories - more considerations * Aspect ratios, scales, and labels * Annotations (most of the day) * Saving plots (pretty quick) * Compound figures * Themes refresher (if time permits) * Introduce Lab-PS3 quickly * Lab 5 * Q & A with Dr. Daniel Anderson --- # Learning Objectives * Additional Considerations when visualizing/comparing categories including over time * Understand how to make a wide variety of tweaks to ggplot. * Understand common modifications to plots to make them more clear and reduce cognitive load + And ways to implement them --- # Axes * Cartesian coordinates - what we generally use <img src="data:image/png;base64,#w5_files/figure-html/cartesian1-1.png" width="100%" /> --- # Different units <img src="data:image/png;base64,#w5_files/figure-html/temp_plot-1.png" width="100%" /> --- # Aspect ratio <img src="data:image/png;base64,#w5_files/figure-html/aspect-ratio-1.png" width="100%" height="100%" /> --- background-image: url("data:image/png;base64,#http://socviz.co/dataviz-pdfl_files/figure-html4/ch-01-perception-curves-1.png") background-size: contain --- # Same scales Use `coord_fixed()` <img src="data:image/png;base64,#w5_files/figure-html/same-scales-1.png" width="720" /> -- Note - I think this is a weird plot, but the point remains --- # Changing aspect ratio * Explore how your plot will look in its final size * No hard/fast rules (if on different scales) * Not even really rules of thumb * Keep visual perception in mind * Try your best to be truthful - show the trend/relation, but don't exaggerate/hide it --- # Handy function (from an apparently deleted tweet from [@tjmahr](https://twitter.com/tjmahr)) <blockquote class="twitter-tweet tw-align-center" data-lang="en"><p lang="en" dir="ltr">here's my favorite helper <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a> function. preview ggsave() output<br><br>ggpreview <- function (..., device = "png") {<br> fname <- tempfile(fileext = paste0(".", device))<br> ggplot2::ggsave(filename = fname, device = device, ...)<br> system2("open", fname)<br> invisible(NULL)<br>}</p>— tj mahr 🍕🍍 (@tjmahr) </blockquote> --- # Gist (side note: gists are a good way to share things) * See the full code/example [here](https://gist.github.com/tjmahr/1dd36d78ecb3cff10baf01817a56e895) * Let's take 3 minutes to play around: + Create a plot (could even be the example in the gist) + Try different aspect ratios by changing the width/length
−
+
03
:
00
--- class: inverse-orange middle # Scale transformations --- # Raw scale ``` r library(gapminder) ggplot(gapminder, aes(year, gdpPercap)) + geom_line(aes(group = country), color = "gray70") ``` <img src="data:image/png;base64,#w5_files/figure-html/scale_transoform1-1.png" width="720" /> --- ### Log10 scale ``` r ggplot(gapminder, aes(year, gdpPercap)) + geom_line(aes(group = country), color = "gray70") + * scale_y_log10(labels = scales::dollar) ``` <img src="data:image/png;base64,#w5_files/figure-html/scale_transform2-1.png" width="720" /> --- <br/> <br/> <img src="data:image/png;base64,#w5_files/figure-html/scale_transform3-1.png" width="1080" /> --- <br/> <br/> <img src="data:image/png;base64,#w5_files/figure-html/scale_transform4-1.png" width="1080" /> --- # Scales ``` r d <- tibble(x = c(1, 3.16, 10, 31.6, 100), log_x = log10(x)) ggplot(d, aes(x, 1)) + geom_point(color = "#0072B2") ggplot(d, aes(x, 1)) + geom_point(color = "#0072B2") + scale_x_log10() ggplot(d, aes(log_x, 1)) + geom_point(color = "#0072B2") ``` --- # Scales <img src="data:image/png;base64,#w5_files/figure-html/labeling-non-linear2-1.png" width="720" /><img src="data:image/png;base64,#w5_files/figure-html/labeling-non-linear2-2.png" width="720" /><img src="data:image/png;base64,#w5_files/figure-html/labeling-non-linear2-3.png" width="720" /> --- # Don't transform twice <style type="text/css"> .code-bg-red .remark-code, .code-bg-red .remark-code * { background-color: #ffe0e0 !important; } </style> .code-bg-red[ ``` r ggplot(d, aes(log_x, 1)) + geom_point(color = "#0072B2") + scale_x_log10() + xlim(-0.2, 2.5) ``` <img src="data:image/png;base64,#w5_files/figure-html/bad-log-1.png" width="720" /> ] --- # Careful with labeling * Has the scale or the data been log transformed? * Specify the base ``` r library(ggtext) ggplot(d, aes(log_x, 1)) + geom_point(color = "#0072B2") + * labs(x = "log<sub>10</sub>(x)") + * theme(axis.title.x = element_markdown()) ``` <img src="data:image/png;base64,#w5_files/figure-html/log-data-1.png" width="720" /> Labels should denote the data, not the scale of the axis --- ``` r ggplot(d, aes(x, 1)) + geom_point(color = "#0072B2") + * scale_x_log10() ``` <img src="data:image/png;base64,#w5_files/figure-html/log-scale-1.png" width="720" /> Labeling the above with `\(log_{10}(x)\)` would be ambiguous and confusing --- # Interpretation Log scales show relative numbers, not raw -- Difference between 100 and 101 (1% change) will be much smaller than difference between 1 and 2 (100% change) -- More resources to learn more * [Blog post by Lisa Charlotte Muth](https://blog.datawrapper.de/weeklychart-logscale/) * The "Logorithmic or Linear scales" section [here](https://ig.ft.com/coronavirus-chart/?areas=usa&areas=gbr&areasRegional=usny&areasRegional=usnh&areasRegional=uspr&areasRegional=usdc&areasRegional=usfl&areasRegional=usmi&cumulative=0&logScale=1&per100K=1&startDate=2021-06-01&values=deaths) --- class: inverse-blue center middle # Labels and captions --- # Disclaimer * APA style requires the labels be made in specific ways * Much of the following discussion still applies * Our book (Wilke) uses a similar style throughout --- # Title ### What is the point of your figure? -- ### What are you trying to communicate -- * Figures should have only one title -- * Use integrated title/subtitles for sharing with a broad audience + Blog posts + Social media + Reports to stakeholders -- * Make sure your figure has a title + Should not start with "This figure displays/shows..." --- # Caption Consider stating the data source Other details relevant to the figure but not important enough for a subtitle --- # Axis labels * The title for the axis * Critical for communication * **Never** use variable names (very common and very poor practice) * State the measure and the unit (if quantitative) + e.g., "Brain Mass (grams)", "Support for Measure (millions of people)", "Dollars spent" + Categorical variable likely will not need to the measurement unit --- # Omission * Consider omitting obvious or redundant labels + Use `labs(x = NULL)` or `labs(x = "")` + If already using `scale_x/y_*()` just supply the `name` argument <img src="data:image/png;base64,#w5_files/figure-html/no-x-1.png" width="720" /> --- # Omission * Do not omit axis titles that are not obvious <img src="data:image/png;base64,#w5_files/figure-html/no-xy-1.png" width="720" /> --- # Don't overdo it <img src="data:image/png;base64,#w5_files/figure-html/overdone-labels-1.png" width="720" /> --- # Practice Let's use the `ggplot2::diamonds` dataset. * Plot the relationship between `carat` and `price` * Play with scale transformations * Give it some good labels * Make any other modifications you'd like that you think makes it prettier and/or easier to interpret.
−
+
05
:
00
--- class: inverse-red center middle # Annotations The big topic for the day --- # Among the most effective * If possible, try to remove legends, and just include annotations -- * Warning - this is often fairly difficult in ggplot (requires a lot of fiddling) -- * Consider saving and making final annotations outside of R + Bad for reproducibility, but good for interpretability -- * There are some packages that can help --- # Building up a plot ``` r remotes::install_github("clauswilke/dviz.supp") head(tech_stocks) ``` ``` ## # A tibble: 6 × 6 ## company ticker date price index_price price_indexed ## <chr> <chr> <date> <dbl> <dbl> <dbl> ## 1 Alphabet GOOG 2017-06-02 976. 285. 342. ## 2 Alphabet GOOG 2017-06-01 967. 285. 339. ## 3 Alphabet GOOG 2017-05-31 965. 285. 338. ## 4 Alphabet GOOG 2017-05-30 976. 285. 342. ## 5 Alphabet GOOG 2017-05-26 971. 285. 341. ## 6 Alphabet GOOG 2017-05-25 970. 285. 340. ``` --- ``` r ggplot(tech_stocks, aes(date, price_indexed, color = ticker)) + geom_line() ``` <img src="data:image/png;base64,#w5_files/figure-html/tech1-1.png" width="720" /> --- ``` r ggplot(tech_stocks, aes(date, price_indexed, color = ticker)) + geom_line() + * scale_color_OkabeIto() ``` <img src="data:image/png;base64,#w5_files/figure-html/tech2-1.png" width="720" /> --- ``` r ggplot(tech_stocks, aes(date, price_indexed, color = ticker)) + geom_line() + scale_color_OkabeIto( * name = "Company", * breaks = c("GOOG", "AAPL", "FB", "MSFT"), * labels = c("Alphabet", "Apple", "Facebook", "Microsoft") ) ``` --- # Bad <img src="data:image/png;base64,#w5_files/figure-html/tech3-eval-1.png" width="720" /> --- ``` r ggplot(tech_stocks, aes(date, price_indexed, color = ticker)) + geom_line() + scale_color_OkabeIto( * name = "Company", * breaks = c("FB", "GOOG", "MSFT", "AAPL"), * labels = c("Facebook", "Alphabet", "Microsoft", "Apple") ) ``` --- # Good <img src="data:image/png;base64,#w5_files/figure-html/tech4-eval-1.png" width="720" /> --- ``` r ggplot(tech_stocks, aes(date, price_indexed, color = ticker)) + geom_line() + scale_color_OkabeIto( name = "Company", breaks = c("FB", "GOOG", "MSFT", "AAPL"), labels = c("Facebook", "Alphabet", "Microsoft", "Apple") ) + scale_x_date( name = "year", limits = c(ymd("2012-06-01"), ymd("2018-12-31")), expand = c(0,0) ) + geom_text( data = filter(tech_stocks, date == "2017-06-02"), aes(y = price_indexed, label = company), * nudge_x = 20 ) ``` --- <img src="data:image/png;base64,#w5_files/figure-html/tech5-eval-1.png" width="720" /> --- ``` r ggplot(tech_stocks, aes(date, price_indexed, color = ticker)) + geom_line() + scale_color_OkabeIto( name = "Company", breaks = c("FB", "GOOG", "MSFT", "AAPL"), labels = c("Facebook", "Alphabet", "Microsoft", "Apple") ) + scale_x_date( name = "year", limits = c(ymd("2012-06-01"), ymd("2018-12-31")), expand = c(0,0) ) + geom_text( data = filter(tech_stocks, date == "2017-06-02"), aes(y = price_indexed, label = company), nudge_x = 20, * hjust = 0, * size = 12 ) ``` --- <img src="data:image/png;base64,#w5_files/figure-html/tech6-eval-1.png" width="720" /> --- ``` r ggplot(tech_stocks, aes(date, price_indexed, color = ticker)) + geom_line() + scale_color_OkabeIto( name = "Company", breaks = c("FB", "GOOG", "MSFT", "AAPL"), labels = c("Facebook", "Alphabet", "Microsoft", "Apple") ) + scale_x_date( name = "year", limits = c(ymd("2012-06-01"), ymd("2018-12-31")), expand = c(0,0) ) + geom_text( data = filter(tech_stocks, date == "2017-06-02"), aes(y = price_indexed, label = company), nudge_x = 20, hjust = 0, size = 12 ) + * guides(color = "none") ``` --- <img src="data:image/png;base64,#w5_files/figure-html/tech7-eval-1.png" width="720" /> --- ``` r ggplot(tech_stocks, aes(date, price_indexed, color = ticker)) + geom_line() + scale_color_OkabeIto( name = "Company", breaks = c("FB", "GOOG", "MSFT", "AAPL"), labels = c("Facebook", "Alphabet", "Microsoft", "Apple") ) + scale_x_date( name = "year", limits = c(ymd("2012-06-01"), ymd("2018-12-31")), expand = c(0,0) ) + geom_text( data = filter(tech_stocks, date == "2017-06-02"), aes(y = price_indexed, label = company), nudge_x = 20, hjust = 0, size = 12 ) + guides(color = "none") + labs( * title = "Tech growth over time", * caption = "Data from Wilke (2019): Fundamentals of Data Visualization", * y = "Stock Price (Indexed)" ) ``` --- <img src="data:image/png;base64,#w5_files/figure-html/tech8-eval-1.png" width="720" /> --- # A few more notes * You might want to try `geom_label()` instead of `geom_text()`, or perhaps layering them with the first providing the white space for the second (as we saw in Lab 2) * Could consider not making the font color vary with the lines (the labels are close enough) * Depending on you you use the legend, it can work almost as well. --- # Example From an actual publication, where I used the legend instead of direct annotations <img src="data:image/png;base64,#img/legend-example.png" alt="Line plot with legend annotation" width="550"/> --- # Labeling bars ``` r avs <- tech_stocks %>% group_by(company) %>% summarize(stock_av = mean(price_indexed)) %>% ungroup() %>% mutate(share = stock_av / sum(stock_av)) avs ``` ``` ## # A tibble: 4 × 3 ## company stock_av share ## <chr> <dbl> <dbl> ## 1 Alphabet 141. 0.229 ## 2 Apple 77.1 0.125 ## 3 Facebook 275. 0.447 ## 4 Microsoft 122. 0.199 ``` --- # Bar plot ``` r ggplot(avs, aes(fct_reorder(company, share), share)) + geom_col(fill = "#0072B2") ``` <img src="data:image/png;base64,#w5_files/figure-html/avs-plot1-1.png" width="720" /> --- # Horizontal ``` r ggplot(avs, aes(share, fct_reorder(company, share))) + geom_col(fill = "#0072B2", alpha = 0.9) ``` <img src="data:image/png;base64,#w5_files/figure-html/avs-plot2-1.png" width="720" /> --- ``` r ggplot(avs, aes(share, fct_reorder(company, share))) + geom_col(fill = "#0072B2", alpha = 0.9) + theme( * panel.grid.major.y = element_blank(), * panel.grid.minor.x = element_blank(), * panel.grid.major.x = element_line(color = "gray80") ) ``` <img src="data:image/png;base64,#w5_files/figure-html/avs-plot2b-1.png" width="720" /> --- # Quick aside Let's actually make a bar plot theme ``` r bp_theme <- function(...) { theme_minimal(...) + theme( panel.grid.major.y = element_blank(), panel.grid.minor.x = element_blank(), panel.grid.major.x = element_line(color = "gray80"), plot.title.position = "plot" ) } ``` --- ``` r ggplot(avs, aes(share, fct_reorder(company, share))) + geom_col(fill = "#0072B2",alpha = 0.9) + geom_text( * aes(share, company, label = round(share, 2)), * nudge_x = 0.02, * size = 8 ) + bp_theme(base_size = 25) ``` <img src="data:image/png;base64,#w5_files/figure-html/avs-plot3-1.png" width="720" /> --- ``` r ggplot(avs, aes(share, fct_reorder(company, share))) + geom_col(fill = "#0072B2", alpha = 0.9) + geom_text( * aes(share, company, label = paste0(round(share*100), "%")), nudge_x = 0.02, size = 8 ) + scale_x_continuous( name = "Market Share", * labels = scales::percent ) + labs( x = NULL, title = "Tech company market control", caption = "Data from Clause Wilke Book: Fundamentals of Data Visualizations" ) + bp_theme(base_size = 25) ``` --- <img src="data:image/png;base64,#w5_files/figure-html/avs-plot4-eval-1.png" width="720" /> --- # Another alternative ``` r ggplot(avs, aes(share, fct_reorder(company, share))) + geom_col(fill = "#0072B2", alpha = 0.9) + geom_text( aes(share, company, label = paste0(round(share*100), "%")), * nudge_x = -0.02, size = 8, * color = "white" ) + scale_x_continuous( "Market Share", labels = scales::percent, expand = c(0, 0, 0.05, 0) ) + labs( x = NULL, title = "Tech company market control", caption = "Data from Clause Wilke Book: Fundamentals of Data Visualizations" ) + bp_theme(base_size = 25) ``` --- <img src="data:image/png;base64,#w5_files/figure-html/avs-plot6-eval-1.png" width="720" /> --- # Last example This is a bit artificial in this case, but... -- It is very common to have small bars. You may want most labels inside, but some outside --- First, create variables specifying what you want. Here I'm using 0.2 as my cutoff for whether the label is inside the bar or outside ``` r avs <- avs %>% mutate( nudge_amount = ifelse(share < 0.2, 0.02, -0.02), label_color = ifelse(share < 0.2, "black", "white") ) avs ``` ``` ## # A tibble: 4 × 5 ## company stock_av share nudge_amount label_color ## <chr> <dbl> <dbl> <dbl> <chr> ## 1 Alphabet 141. 0.229 -0.02 white ## 2 Apple 77.1 0.125 0.02 black ## 3 Facebook 275. 0.447 -0.02 white ## 4 Microsoft 122. 0.199 0.02 black ``` --- `nudge_*` doesn't work inside aes 🤷♂️ ``` r ggplot(avs, aes(share, fct_reorder(company, share))) + geom_col(fill = "#0072B2", alpha = 0.9) + geom_text( aes( share, company, label = paste0(round(share*100), "%"), * color = label_color ), * nudge_x = avs$nudge_amount, size = 8, ) + scale_x_continuous( "Market Share", labels = scales::percent, expand = c(0, 0, 0.05, 0) ) + * scale_color_identity() + labs( x = NULL, title = "Tech company market control", caption = "Data from Clause Wilke Book: Fundamentals of Data Visualizations" ) + bp_theme(base_size = 25) ``` --- <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-8-1.png" width="720" /> --- # Distributions ``` r ggplot(iris, aes(Sepal.Length, fill = Species)) + geom_density(alpha = 0.3, color = "white") ``` <img src="data:image/png;base64,#w5_files/figure-html/densities1-1.png" width="720" /> --- ``` r ggplot(iris, aes(Sepal.Length, fill = Species)) + geom_density(alpha = 0.3, color = "white") + scale_fill_OkabeIto() ``` <img src="data:image/png;base64,#w5_files/figure-html/densities2-1.png" width="720" /> --- # Labeling ### One method ``` r label_locs <- tibble(Sepal.Length = c(5.45, 6, 7), density = c(1, 0.8, 0.6), Species = c("setosa", "versicolor", "virginica")) ggplot(iris, aes(Sepal.Length, fill = Species)) + geom_density(alpha = 0.3, color = "white") + scale_fill_OkabeIto() + geom_text(aes(label = Species, y = density, color = Species), data = label_locs) ``` --- <img src="data:image/png;base64,#w5_files/figure-html/densities3-eval-1.png" width="720" /> --- ``` r ggplot(iris, aes(Sepal.Length, fill = Species)) + geom_density(alpha = 0.3, color = "white") + scale_fill_OkabeIto() + * scale_color_OkabeIto() + geom_text(aes(label = Species, y = density, color = Species), data = label_locs) + guides(color = "none", fill = "none") ``` --- <img src="data:image/png;base64,#w5_files/figure-html/densities4-eval-1.png" width="720" /> --- ``` r label_locs <- tibble(Sepal.Length = c(5.4, 6, 6.9), density = c(1, 0.75, 0.6), Species = c("setosa", "versicolor", "virginica")) ggplot(iris, aes(Sepal.Length, fill = Species)) + geom_density(alpha = 0.3, color = "white") + scale_fill_OkabeIto() + scale_color_OkabeIto() + geom_text(aes(label = Species, y = density), * color = "gray40", data = label_locs) + * guides(fill = "none") ``` --- <img src="data:image/png;base64,#w5_files/figure-html/densities5-eval-1.png" width="720" /> --- # Other options * Rather than using a new data frame, you could use multiple calls to `annotate`. * One is not necessarily better than the other, but I prefer the data frame method * Keep in mind you can .bolder[always] use multiple data sources within a single plot + Each layer can have its own data source + Common in geographic data in particular --- # Annotate example ``` r ggplot(iris, aes(Sepal.Length, fill = Species)) + geom_density(alpha = 0.3) + scale_fill_OkabeIto() + scale_color_OkabeIto() + * annotate("text", label = "setosa", x = 5.45, y = 1, color = "gray40") + * annotate("text", label = "versicolor", x = 6, y = 0.8, color = "gray40") + * annotate("text", label = "virginica", x = 7, y = 0.6, color = "gray40") + guides(fill = "none") ``` --- <img src="data:image/png;base64,#w5_files/figure-html/annotate-eval-1.png" width="720" /> --- # Practice Use the diamonds dataset again. * Compute the mean carat size for each color * Create a bar chart * Include labels for each bar rounding the actual value to two decimals * Make any other modifications you'd like
−
+
05
:
00
--- class: inverse-red middle # [{geomtextpath}](https://allancameron.github.io/geomtextpath/) Not super new package, but looks really cool; still I haven't used much --- # {geomtextpath} ``` r #install.packages("geomtextpath") library(geomtextpath) ggplot(iris, aes(Sepal.Length)) + geom_textdensity(aes(color = Species, label = Species)) ``` <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-10-1.png" width="720" /> --- Slight modifications ``` r ggplot(iris, aes(Sepal.Length)) + geom_textdensity( aes(color = Species, label = Species), hjust = 0.2, vjust = 0.3, size = 8 # bigger than you'll probs need ) + scale_color_OkabeIto() + theme(legend.position = "none") ``` Note I couldn't get `aes(fill = Species)` to work. --- <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-12-1.png" width="720" /> --- # A workaround ``` r ggplot(iris, aes(Sepal.Length)) + geom_density(aes(fill = Species), alpha = 0.4) + geom_textdensity( aes(color = Species, label = Species), hjust = "ymax", vjust = -0.3, size = 8 ) + scale_color_OkabeIto() + scale_fill_OkabeIto() + theme(legend.position = "none") ``` --- <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-14-1.png" width="720" /> --- # Smooths ``` r ggplot(mpg, aes(displ, hwy, color = class)) + geom_point() + * geom_labelsmooth( * aes(label = class), * method = "lm", * hjust = "xmax" * ) + scale_color_OkabeIto() + guides(color = "none") ``` --- <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-16-1.png" width="720" /> --- # Drop-in replacements  --- # Even works w/Maps! ``` r library(sf) ggplot() + geom_textsf( data = waterways, aes(label = name), text_smoothing = 65, linecolour = "#8888B3", vjust = -0.8, fill = "#E6F0B3", alpha = 0.8, size = 4 ) + theme(panel.grid = element_line()) + lims(x = c(-4.7, -3), y = c(55.62, 56.25)) ``` --- <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-18-1.png" width="720" /> --- class: inverse-blue center middle # ggrepel --- # Plot text directly ``` r cars <- rownames_to_column(mtcars) ggplot(cars, aes(hp, mpg)) + geom_text(aes(label = rowname)) ``` <img src="data:image/png;base64,#w5_files/figure-html/mtcars-text-1.png" width="720" /> --- # Repel text ``` r *library(ggrepel) ggplot(cars, aes(hp, mpg)) + * geom_text_repel(aes(label = rowname)) ``` <img src="data:image/png;base64,#w5_files/figure-html/repel1-1.png" width="720" /> --- # Slightly better ``` r ggplot(cars, aes(hp, mpg)) + * geom_point(color = "gray70") + geom_text_repel(aes(label = rowname), * min.segment.length = 0) ``` <img src="data:image/png;base64,#w5_files/figure-html/repel2-1.png" width="720" /> --- # Common use cases * Label some sample data that makes some theoretical sense (we've seen this before) * Label outliers * Label points from a specific group (e.g., similar to highlighting - can be used in conjunction) --- # Some new data Please follow along ``` r #remotes::install_github("kjhealy/socviz") library(socviz) ``` ``` r by_country <- organdata %>% group_by(consent_law, country) %>% summarize(donors_mean= mean(donors, na.rm = TRUE), donors_sd = sd(donors, na.rm = TRUE), gdp_mean = mean(gdp, na.rm = TRUE), health_mean = mean(health, na.rm = TRUE), roads_mean = mean(roads, na.rm = TRUE), cerebvas_mean = mean(cerebvas, na.rm = TRUE)) ``` --- ``` r by_country ``` ``` ## # A tibble: 17 × 8 ## # Groups: consent_law [2] ## consent_law country donors_mean donors_sd gdp_mean health_mean roads_mean cerebvas_mean ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Informed Australia 10.6 1.14 22179. 1958. 105. 558. ## 2 Informed Canada 14.0 0.751 23711. 2272. 109. 422. ## 3 Informed Denmark 13.1 1.47 23722. 2054. 102. 641. ## 4 Informed Germany 13.0 0.611 22163. 2349. 113. 707. ## 5 Informed Ireland 19.8 2.48 20824. 1480. 118. 705. ## 6 Informed Netherlands 13.7 1.55 23013. 1993. 76.1 585. ## # ℹ 11 more rows ``` --- # Scatterplot ``` r ggplot(by_country, aes(gdp_mean, health_mean)) + geom_point() ``` <img src="data:image/png;base64,#w5_files/figure-html/scatter-country-1.png" width="720" /> --- # Outliers ``` r ggplot(by_country, aes(gdp_mean, health_mean)) + geom_point() + * geom_text_repel(data = filter(by_country, * gdp_mean > 25000 | * gdp_mean < 20000), * aes(label = country)) ``` --- <img src="data:image/png;base64,#w5_files/figure-html/outliers1-eval-1.png" width="720" /> --- # Combine with highlighting ``` r *library(gghighlight) ggplot(by_country, aes(gdp_mean, health_mean)) + geom_point() + * gghighlight(gdp_mean > 25000 | gdp_mean < 20000) + * geom_text_repel(aes(label = country)) ``` * Notice you only have to specify the points to highlight and `geom_text_repel` will then only label those points --- <img src="data:image/png;base64,#w5_files/figure-html/outliers2-eval-1.png" width="720" /> --- # Combine with highlighting Switch to make outliers grayed out and labeled ``` r ggplot(by_country, aes(gdp_mean, health_mean)) + geom_point() + gghighlight(gdp_mean > 20000 & gdp_mean < 25000 ) + geom_text_repel(data = filter(by_country, gdp_mean > 25000 | gdp_mean < 20000), aes(label = country), color = "#BEBEBEB3") ``` Note I found the exact gray color by looking at the source code. Specifically, it is the output from `ggplot2::alpha("grey", 0.7)` --- <img src="data:image/png;base64,#w5_files/figure-html/outliers3-eval-1.png" width="720" /> --- # By group ``` r ggplot(by_country, aes(gdp_mean, health_mean)) + geom_point() + geom_text_repel(data = filter(by_country, consent_law == "Presumed"), aes(label = country)) ``` <img src="data:image/png;base64,#w5_files/figure-html/label-by-group-1.png" width="720" /> --- # By group ``` r ggplot(by_country, aes(gdp_mean, health_mean)) + * geom_point(color = "#DC5265") + * gghighlight(consent_law == "Presumed") + geom_text_repel(aes(label = country), min.segment.length = 0, * box.padding = 0.75) + labs(title = "GDP and Health", subtitle = "Countries with a presumed organ donation consent are highlighted", caption = "Data from the General Social Science Survey, Distributed through the socviz R package", x = "Mean GDP", y = "Mean Health") ``` --- <img src="data:image/png;base64,#w5_files/figure-html/label-by-group-eval-1.png" width="720" /> --- # Practice Use the mpg dataset * Group by manufacturer * Compute the mean highway `hwy` and mean `displ` * Plot the relation between these means. Plot points and label the manufacturer of each point. --- class: inverse-blue middle # ggforce Please follow along --- # Annotating groups of points Consider using any of the following from **ggforce** to annotate specific points * `geom_mark_rect()` * `geom_mark_circle()` * `geom_mark_ellipse()` * `geom_mark_hull()` --- # Examples ``` r library(palmerpenguins) library(ggforce) penguins %>% drop_na() %>% # Can't take missing data ggplot(aes(bill_length_mm, bill_depth_mm)) + * geom_mark_ellipse(aes(group = species, label = species)) + geom_point(aes(color = species)) + * coord_cartesian(xlim = c(28, 62), ylim = c(13, 23)) + guides(color = "none") ``` --- class: middle <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-21-1.png" width="720" /> --- # Limit to a single group ``` r penguins %>% drop_na() %>% ggplot(aes(bill_length_mm, bill_depth_mm)) + geom_point(aes(color = species)) + geom_mark_ellipse(aes(group = species, label = species), * data = filter(drop_na(penguins), * species == "Gentoo")) + coord_cartesian(xlim = c(28, 62), ylim = c(13, 23)) ``` --- class: middle <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-23-1.png" width="720" /> --- # Switch to hull Note - requires the **concaveman** package be installed ``` r penguins %>% drop_na() %>% ggplot(aes(bill_length_mm, bill_depth_mm)) + geom_point(aes(color = species)) + * geom_mark_hull(aes(group = species, label = species), * data = filter(drop_na(penguins), * species == "Gentoo")) + coord_cartesian(xlim = c(28, 62), ylim = c(13, 23)) ``` --- class: middle <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-25-1.png" width="720" /> --- # Change expand ``` r penguins %>% drop_na() %>% ggplot(aes(bill_length_mm, bill_depth_mm)) + geom_point(aes(color = species)) + geom_mark_hull(aes(group = species, label = species), * expand = unit(1, "mm"), data = filter(drop_na(penguins), species == "Gentoo")) + coord_cartesian(xlim = c(28, 62), ylim = c(13, 23)) ``` --- class: middle <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-27-1.png" width="720" /> --- # More in-depth annotations First create a description ``` r penguins <- penguins %>% mutate(desc = ifelse(species != "Gentoo", "", "During deep dives, gentoo penguins reduce their heart rate from 80 to 100 beats per minute (bpm) down to 20 bpm. Gentoo penguins use nesting materials ranging from pebbles and molted feathers in Antarctica to vegetation on subantarctic islands. Gentoos are the third largest penguin, following the emperor and king.")) ``` --- # Now add as a description ``` r penguins %>% drop_na() %>% ggplot(aes(bill_length_mm, bill_depth_mm)) + geom_point(aes(color = species)) + geom_mark_ellipse(aes(group = species, label = species, * description = desc), data = filter(drop_na(penguins), species == "Gentoo"), * label.fill = "#b3cfff") + coord_cartesian(xlim = c(28, 62), ylim = c(13, 23)) ``` --- class: middle <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-30-1.png" width="720" /> --- # Similar We can also just add a textbox through **{ggtext}** ``` r txtbox <- tibble( bill_length_mm = 23, bill_depth_mm = 16, lab = '"They may all waddle around in their tuxedolike feathers, but the penguins of the Antarctic Peninsula are not equal in their ability to adapt to a warming climate. While the populations of the Adélie and chinstrap penguin species are currently declining, the gentoo species is increasing. But this has not always been the case, according to a recent study published in the journal Scientific Reports." - Scientific American' ) ``` --- ``` r penguins %>% drop_na() %>% ggplot(aes(bill_length_mm, bill_depth_mm)) + geom_point(aes(color = species)) + ggtext::geom_textbox(aes(label = lab), data = txtbox) + coord_cartesian(xlim = c(17, 62), ylim = c(13, 22)) ``` <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-32-1.png" width="720" /> --- # Last bit The **ggforce** package is well worth exploring more. See [here](https://z3tt.github.io/OutlierConf2021/) for a nice walkthrough that has good data viz and uses some of the **ggforce** functions (as well as illustrating a few other cool packages) --- class: inverse-blue middle # Saving plots And potentially making additional edits --- # Raster/Vector We'll talk about this again when discussing maps, but it relates to saving as well. **Raster** (also called bitmap) stores images as a grid of points (pixels) **Vector** store instructions for how the figure should be drawn. Image is redrawn as it is printed/displayed on screen .footnote[see [Wilke, Chapter 27](https://clauswilke.com/dataviz/image-file-formats.html) for more information.] --- # Differences .footnote[Image from Wilke]  **Vector graphics**: pdf, eps, svg **Raster graphics**: png, jpeg, tiff, gif --- # Downsides to vector graphics * Possible differences in appearance between displays (programs, computers, etc.) * Very large/complex figures can balloon to giant file sizes and be slow to render --- # Lossy/Lossless compression * Lossless - guarantees image is, pixel for pixel, identical to original + png and tiff use lossless compression * Lossy - accepts some minor image artifacts to reduce size + jpeg -- ### Practical advice Export to PDF -- If that won't work (web), use png -- If the file size ends up being too large, go with jpeg --- # Practice * Create a plot * Save it as a PDF with `ggsave()` Note, the first argument to `ggsave()` is the `path`, so you could do something like `ggsave(here::here("myplot.pdf"))` You can also specify the width/height. By default, it will save the last plot you produced, but you can also specify it with `plot = ` argument, where you pass an object that has the plot --- # Modifications I rarely do this, but if I do, I tend to use [Inkscape](https://inkscape.org/), which is free. ``` r ggplot(mpg, aes(displ, hwy)) + geom_point(color = "gray80") + geom_point(color = "#FD7A43", data = filter(mpg, cyl == 4)) ggsave("~/Desktop/example-plot.pdf", width = 6.5, height = 6.5) ``` --- class: inverse-red middle # Compound figures Please follow along --- # Options My favorite: [{patchwork}](https://patchwork.data-imaginist.com/index.html) -- Others: * [{cowplot}](https://wilkelab.org/cowplot/index.html) * [{ggpubr}](https://github.com/kassambara/ggpubr/) * [{gridExtra}](https://cran.r-project.org/web/packages/gridExtra/index.html) --- # Example * First, create two plots ``` r p1 <- ggplot(mpg, aes(displ, hwy)) + geom_point(aes(color = factor(cyl))) + geom_smooth() + scale_color_OkabeIto() p2 <- ggplot(mpg, aes(factor(cyl), hwy)) + geom_boxplot(aes(fill = factor(cyl))) + scale_fill_OkabeIto() ``` --- # Side by side ``` r library(patchwork) p1 + p2 ``` <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-35-1.png" width="720" /> --- # Collect legends ``` r p1 + p2 + plot_layout(guides = "collect") ``` <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-36-1.png" width="720" /> --- # Stack vertically ``` r p1 / p2 + plot_layout(guides = "collect") ``` <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-37-1.png" width="720" /> --- # Add a third plot ``` r p3 <- mpg %>% group_by(manufacturer) %>% summarize(mean_mpg = mean(hwy, na.rm = TRUE)) %>% mutate(manufacturer = fct_reorder(manufacturer, mean_mpg)) %>% ggplot(aes(mean_mpg, manufacturer)) + geom_col(fill = "#25B6EE") + theme(axis.text.y = element_text(size = 12)) ``` --- # Put box plot on bottom ``` r (p1 + p3) / p2 + plot_layout(guides = "collect") ``` <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-39-1.png" width="720" /> --- # Overall title ``` r (p1 + p3) / p2 + plot_layout(guides = "collect") + plot_annotation("Some cool plots") ``` <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-40-1.png" width="720" /> --- # Tags ``` r (p1 + p3) / p2 + plot_layout(guides = "collect") + plot_annotation("Some cool plots", tag_levels = "a") ``` <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-41-1.png" width="720" /> --- # Insets ``` r p3_small_txt <- p3 + theme(axis.text.y = element_text(size = 8), text = element_text(size = 8)) p1 + inset_element(p3_small_txt, 0.6, 0.6, 1, 1) ``` <img src="data:image/png;base64,#w5_files/figure-html/unnamed-chunk-42-1.png" width="720" /> --- class: inverse-red center middle # Themes Refresher --- <img src="data:image/png;base64,#w5_files/figure-html/diff-themes-1.png" width="720" /> --- # ggthemes * Good place to start. All sorts of themes. * Includes color scales, etc., that align with themes * You can even conform with other software + fit into an economics conference with `theme_stata` See the themes [here](https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/) --- # BBC The BBC uses ggplot for most of its graphics. They've developed a package with a theme and some functions to help make it match their style more. See the repo [here](https://github.com/bbc/bbplot) Their [Journalism Cookbook](https://bbc.github.io/rcookbook/) is really nice too --- background-image: url(data:image/png;base64,#https://github.com/bbc/bbplot/raw/master/chart_examples/bbplot_example_plots.png) background-size: contain --- # Similarly, the Urban Institute Visual Guide See the repo [here](https://github.com/UrbanInstitute/urbnthemes) --- # So, I created one! * Based on UO's visual guide [here](https://communications.uoregon.edu/uo-brand/visual-identity/colors) [demo] --- # ggthemeassist * Another great place to start with making major modifications/creating your own custom theme * Can't do everything, but can do a lot * See [here](https://github.com/calligross/ggthemeassist) [demo] --- # `theme()` for everything else * You can basically change your plot to look however you want through `theme` * Generally a bit more complicated * I've used ggplot for *years* and only really now gaining fluency with it --- # Last Quick example The *google_trends* dataset comes from a [fivethirtyeight story](https://fivethirtyeight.com/features/the-media-really-started-paying-attention-to-puerto-rico-when-trump-did/) about how the media covered hurricanes and Trump. ``` r library(fivethirtyeight) g <- google_trends %>% pivot_longer(starts_with("hurricane"), names_to = "hurricane", values_to = "interest", names_pattern = "_(.+)_") landfall <- tibble( date = lubridate::mdy( c("August 25, 2017", "September 10, 2017", "September 20, 2017") ), hurricane = c("Harvey Landfall", "Irma Landfall", "Maria Landfall") ) ``` --- Let's start by visualizing the change in trends for each hurricane over time in one plot with three scales. We can map color to a discrete scale here. We can add vertical lines to show when each of them made landfall! ``` r p <- ggplot(g, aes(date, interest)) + geom_ribbon(aes(fill = hurricane, ymin = 0, ymax = interest), alpha = 0.6) + geom_vline(aes(xintercept = date), landfall, color = "gray80", lty = "dashed") + geom_text(aes(x = date, y = 80, label = hurricane), landfall, color = "gray80", nudge_x = 0.5, hjust = 0) + labs(x = "", y = "Google Trends", title = "Hurricane Google trends over time", caption = "Source: https://github.com/fivethirtyeight/data/tree/master/puerto-rico-media") + scale_fill_brewer("Hurricane", palette = "Set2") ``` --- <img src="data:image/png;base64,#w5_files/figure-html/baseplot-eval-1.png" width="720" /> --- We can use ggthemeassist to make a whole bunch of changes! ``` r p + theme( panel.grid.major = element_line(colour = "gray30"), panel.grid.minor = element_line(colour = "gray30"), axis.text = element_text(colour = "gray80"), axis.text.x = element_text(colour = "gray80"), axis.text.y = element_text(colour = "gray80"), axis.title = element_text(colour = "gray80"), legend.text = element_text(colour = "gray80"), legend.title = element_text(colour = "gray80"), panel.background = element_rect(fill = "gray10"), plot.background = element_rect(fill = "gray10"), legend.background = element_rect(fill = NA, color = NA), legend.position = c(0.2, -0.1), legend.direction = "horizontal", plot.margin = margin(10, 10, b = 20, 10), plot.caption = element_text(colour = "gray80", vjust = 1), plot.title = element_text(colour = "gray80") ) ``` --- class: inverse-red center middle # Lab PS-3 --- class: inverse-blue center middle # Lab 5 --- class: inverse-red center middle # Q & A with Dr. Daniel Anderson --- class: inverse-green center middle # Next time ### Wrap up visualizing changes over time/distributions and Intro to Websites, Flex dashboards, CSS customizations? Note: Change in schedule and Lab-PS3- Last Lab problem set in this class- is also posted