Lab 1



Labs

Overview

This lab covers three primary topics: (a) working with git and GitHub collaboratively, (b) creating basic visualizations, and (c) working with textual data.You should plan to work together with your group.

The basics of the lab are to:

To receive full credit you must create and merge branches. The contributions across team members should also appear roughly equal.

Data

*We will learn to explore some data from the Department of Education for the first part of this lab that we saw in the class slides (https://github.com/datalorax/edld652)

*We’ll then work with Week 1 of the #tidytuesday data for 2019, specifically the #rstats dataset, containing nearly 500,000 tweets over a little more than a decade using that hashtag. The data is in the data folder of the course repo.

Installing edld652 data package

remotes::install_github("datalorax/edld652", force = TRUE)

Check to see if all is working

After you’ve done everything on the prior slide, run the following to make sure it’s working

library(edld652)
list_datasets()

Accessing a dataset

For example: Average cohort graduate rates for local education agency data, 2011 to 2019

acgd <- get_data("EDFacts_acgr_lea_2011_2019")
# acgd

Accessing documentation

acgdd <- get_documentation("EDFacts_acgr_lea_2011_2019")
# acgdd

Now let’s move on to the next Data

We’ll now work with Week 1 of the #tidytuesday data for 2019, specifically the #rstats dataset, containing nearly 500,000 tweets over a little more than a decade using that hashtag. The data is in the data folder of the course repo. Start with grabbing that data.

In this part of the lab you will do:

1. Initial exploration

Create histograms and density plots of the display_text_width. Try at least four different binning methods and select what you think best represents the data for each. Provide a brief justification for your decision.

2. Look for “plot”

Search the text column for the work “plot”. Report the proportion of posts containing this word.

3. Plot rough draft

Create the following figure of the 15 most common words represented in the posts.

Some guidance

4. Stylized Plot

Style the plot so it (mostly) matches the below. It does not need to be exact, but it should be close.

Finishing up

It is expected that this lab will take you more time than we will devote to it in class. Class time should be used to clarify any points of confusion and if you run into issues after class, please get in touch with me so we can arrange a time to meet and I can help you.

Once you have finished, please go to Canvas and submit a link to your shared repo. Credit will be awarded based on the commit history.