Intro to R for Social Data ScienceVisualization 1Merlin Schaeffer
 Department of Sociology2021-09-171 / 32

2 / 32

3 / 32

# Add packages to library
library(tidyverse) # Add the tidyverse package to my current library.
library(haven) # Read and handle SPSS, Stata & SAS data (no need to install)
library(essurvey) # Add ESS API package to library.
library(ggplot2) # Allows us to create nice figures.
# Import the ESS round 9 data via the API
ESS <- import_rounds(rounds = 9, ess_email = "YOUR-EMAIL", format = "spss")

4 / 32

# Add packages to library
library(tidyverse) # Add the tidyverse package to my current library.
library(haven) # Read and handle SPSS, Stata & SAS data (no need to install)
library(essurvey) # Add ESS API package to library.
library(ggplot2) # Allows us to create nice figures.
# Import the ESS round 9 data via the API
ESS <- import_rounds(rounds = 9, ess_email = "YOUR-EMAIL", format = "spss")

ESS <- transmute(ESS, # Recode several variables & keep only the recoded ones (i.e., transmute vs mutate).
                 idno = zap_labels(idno),
                 # Make the following variables factors:
                 cntry = as_factor(cntry), 
                 gndr = as_factor(gndr),
                 facntr = as_factor(facntr),
                 mocntr = as_factor(mocntr),
                 # Make the following variables numeric:
                 imbgeco = max(imbgeco, na.rm = TRUE) - zap_labels(imbgeco), # Also turn scale around.
                 imueclt = max(imueclt, na.rm = TRUE) - zap_labels(imueclt), # Also turn scale around.
                 imwbcnt = max(imwbcnt, na.rm = TRUE) - zap_labels(imwbcnt), # Also turn scale around.
                 agea = zap_labels(agea),
                 pspwght = zap_labels(pspwght),
                 eduyrs = case_when(
                   eduyrs > 21 ~ 21, # Recode to max 21 years of edu.
                   eduyrs < 9 ~ 9, # Recode to min 9 years of edu.
                   TRUE ~ zap_labels(eduyrs) # Make it numeric.
                 ),
)

4 / 32

# Case selection.
ESS <- dplyr::filter(ESS,
                     # Only respondents whose parents were born in country of interview.
                     facntr == "Yes" & mocntr == "Yes" &
                       # Only respondents from direct neighbors of Denmark:
                       (cntry == "Denmark" | cntry == "Germany" | cntry == "Sweden" | cntry == "Norway")
)
# Casewise deletion of missing values
(ESS <- drop_na(ESS))
# # A tibble: 5,354 × 11
#     idno cntry   gndr   facntr mocntr imbgeco imueclt imwbcnt  agea pspwght eduyrs
#    <dbl> <fct>   <fct>  <fct>  <fct>    <dbl>   <dbl>   <dbl> <dbl>   <dbl>  <dbl>
#  1    10 Germany Female Yes    Yes          0       5       5    65   0.854     11
#  2    64 Germany Female Yes    Yes          1       2       2    74   0.760     20
#  3    65 Germany Male   Yes    Yes          6       6       7    64   1.08      12
#  4    91 Germany Female Yes    Yes          2       2       2    54   1.27      14
#  5   150 Germany Female Yes    Yes          0       0       4    71   0.942     12
#  6   212 Germany Male   Yes    Yes          2       2       2    41   1.42      14
#  7   255 Germany Male   Yes    Yes          3       3       3    62   1.23      16
#  8   270 Germany Male   Yes    Yes          3       6       5    65   0.978     14
#  9   304 Germany Female Yes    Yes          1       1       5    47   0.984     13
# 10   311 Germany Male   Yes    Yes          0       0       5    67   0.535     18
# # … with 5,344 more rows

5 / 32

Why visualize? A simulated exampleWe are better in detecting visual patterns in figures compared to numeric patterns in tables.
You will reach wider audiences with figures than with tables.
You will understand your own data faster while exploring it.
# Multilevel mixed effects model.
lmer(data = sim_data,
  formula = xeno ~ education +
  (1 + education | Country)) %>% 
  stargazer(type = "text", style = "asr")
# 
# ---------------------------------
#                         xeno     
# ---------------------------------
# education             -0.799***  
# Constant                0.314    
# N                        500     
# Log Likelihood        -154.000   
# AIC                    320.000   
# BIC                    345.000   
# ---------------------------------
# *p < .05; **p < .01; ***p < .001

6 / 32

Why visualize? A simulated example

We are better in detecting visual patterns in figures compared to numeric patterns in tables.
You will reach wider audiences with figures than with tables.
You will understand your own data faster while exploring it.

# Multilevel mixed effects model.
lmer(data = sim_data,
  formula = xeno ~ education +
  (1 + education | Country)) %>% 
  stargazer(type = "text", style = "asr")
# 
# ---------------------------------
#                         xeno     
# ---------------------------------
# education             -0.799***  
# Constant                0.314    
# N                        500     
# Log Likelihood        -154.000   
# AIC                    320.000   
# BIC                    345.000   
# ---------------------------------
# *p < .05; **p < .01; ***p < .001

6 / 32

Why ggplot2? Because of its grammar of graphics

Independently specify the building blocks of a figure and combine them to create just about any kind of figure you want; its like Lego ;-).

7 / 32

The coordinate system

ggplot() # Create an empty coordinate system.

8 / 32

The coordinate system

ggplot(data = ESS) # Create an empty coordinate system for the ESS data.

9 / 32

Layers

ggplot(data = ESS) + # Add ...
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs)) # a "layer" of points (i.e., a scatter plot).

10 / 32

Layers

ggplot(data = ESS) +
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs))

11 / 32

The layered grammar of graphics

# A general template
ggplot(data = <DATA>) +         # Create a coordinate system for <DATA>, and add "+"
  <GEOM_FUNCTION>(              # a layer of (geometric) information, which
     mapping = aes(<MAPPINGS>), # maps our data to aestetics, and
     stat = <STAT>,             # may depend on statistical transformations.
     position = <POSITION>      # Positioning may be adjusted.
  ) +
  <COORDINATE_FUNCTION> +       # Change the default coordinate system.
  <FACET_FUNCTION>              # Draw sub-plots by categorical variables.

Source: Wickham & Grolemund "R for Data Science"

ggplot2 contains many geom functions, which put layers of different types of geometric objects (e.g., points, bars, lines) over a coordinate system.

12 / 32

The layered grammar of graphics

# A general template
ggplot(data = <DATA>) +         # Create a coordinate system for <DATA>, and add "+"
  <GEOM_FUNCTION>(              # a layer of (geometric) information, which
     mapping = aes(<MAPPINGS>), # maps our data to aestetics, and
     stat = <STAT>,             # may depend on statistical transformations.
     position = <POSITION>      # Positioning may be adjusted.
  ) +
  <COORDINATE_FUNCTION> +       # Change the default coordinate system.
  <FACET_FUNCTION>              # Draw sub-plots by categorical variables.

Source: Wickham & Grolemund "R for Data Science"

ggplot2 contains many geom functions, which put layers of different types of geometric objects (e.g., points, bars, lines) over a coordinate system.

All geom functions depend on the mapping argument. It is paired with aes(), which stands for "aestetic". Aestetics are the visual properties of your plot.
The most important aestetics of any graph are the y-axis and the x-axis. Therefore, aes() depends on x and y, because these specify which variable to map to the y-axis and which one to map to the x-axis.
But of course, aestetics also means, among others, color, shape, size, and so on.

12 / 32

Aestetics The visual properties of your plot

If you want to have an aestetic depend on the values of a variable, you need to specify it within aes().

ggplot(data = ESS) +
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs, color = cntry)) # Color by country.

13 / 32

ggplot2 will automatically assign a unique aesthetic (e.g., color/shape/size/etc.) to each value of the variable.
It will also generate a legend.

Aestetics

ggplot(data = ESS) +
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs, color = cntry, 
                           size = pspwght)) # Size by post-stratification weight.

14 / 32

You can manually control the aestetics, that is, which color and which sizes. But that is fine tuning. We want to explore our data right now.

Aestetics

Because R is object-oriented, aestetics behave differently depending on whether you give it a categorical or a continuous variable.

ggplot(data = ESS) +
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs, 
                           color = pspwght, size = cntry)) # Exchanged color and date aes.

15 / 32

Now color is gradual, rather than different colors.
For size, a categorical variable makes little sense.
Categorical are factor and character vectors.
continuous are numerical vectors.

Aestetics

If you want to define an aestetic irrespective of the values of any variable, you need to place outside the mapping argument.

ggplot(data = ESS) +
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs, 
                           size = pspwght, color = cntry), 
             shape = 21) # Use hollow circles.

16 / 32

alpha adds transparency, which varies between 0 (see through) and 1 nontransparent.
You need to give that aestetic a value that makes sense to it.

Geometric objects As what do you visualize your data?

How are these two plots similar?

ggplot(data = ESS) + 
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs))

ggplot(data = ESS) +
  geom_smooth(mapping = aes(y = imwbcnt, x = eduyrs))

17 / 32

They show the same data, but expressed as different geometric objects.
ggplot2 contains +30 geoms. Extension packages contain even more.

Geometric objects As what do you visualize your data?

ggplot(data = ESS) + 
  geom_boxplot(mapping = aes(y = imwbcnt, 
                             x = factor(eduyrs)))

Source: Wikipedia

18 / 32

Geoms & weights Apply or visualize, it depends on the geom ...

ggplot(data = ESS) + 
  geom_point(aes(y = imwbcnt, x = eduyrs, 
                 size = pspwght)) # Visualize

ggplot(data = ESS) +
  geom_smooth(aes(y = imwbcnt, x = eduyrs, 
                  weight = pspwght)) # Apply

19 / 32

Geoms & aestetics Some aestetics are geom specific

ggplot(data = ESS) + 
  geom_point(aes(y = imwbcnt, x=eduyrs, color=cntry,
                 size = pspwght, 
                 shape = cntry))

ggplot(data = ESS) +
  geom_smooth(aes(y=imwbcnt, x=eduyrs, color=cntry, 
                  weight = pspwght, 
                  linetype = cntry))

20 / 32

We can use the color aestetic in both plots.
We cannot use shape for lines and line types for points.
Note that ggplot2 automatically groups data for geoms whenever you map an aesthetic to a categorical variable!

Multiple geoms Are layered on top of each other

To have several geoms in one plot, simply add + them on top of each other.

ggplot(data = ESS) + # Coordinate system, add ...
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs, size = pspwght)) + # layer of points, add ...
  geom_smooth(mapping = aes(y = imwbcnt, x = eduyrs, weight = pspwght)) # layer of smoothed average & 95%-CI.

21 / 32

Multiple geoms

The order of geoms matters, ggplot2 adds layer on top of layer.

ggplot(data = ESS) + # Coordinate system, add ...
  geom_smooth(mapping = aes(y = imwbcnt, x = eduyrs, weight = pspwght)) +
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs, size = pspwght))

22 / 32

Global aestetics

To avoid repetitive code, we can specify global aestetics, which (by default) hold for all geoms.

ggplot(data = ESS, mapping = aes(y = imwbcnt, x = eduyrs)) + # Coord. system with global aestetics, add ...
  geom_point() + # a layer of points, add ...
  geom_smooth() # a layer with a line of locally-smoothed averages and CI.

23 / 32

By the way, this is a nice example why graphics are great for data exploration:

ggplot(data = ESS, mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) + # Coord. system with global aestetics, add ...
  geom_boxplot() + # a layer of boxplots, add ...
  # For some reason, geom_smooth needs the "aes(group = 1)" argument.
  geom_smooth(mapping = aes(group = 1), se = FALSE) + # No CI (i.e., confidence interval), add ...
  geom_smooth(mapping = aes(group = 1), method = "lm", se = FALSE, color = "red") # an OLS line.

24 / 32

Local aestetics For the single geoms

Beware, local aesthetics override the global (default) aestetics!

ggplot(data = ESS, mapping = aes(y = imwbcnt, x = eduyrs)) +
  geom_point(mapping = aes(color = cntry, size = pspwght), alpha = 0.2) + # aes() for geom_point exclusively.
  geom_smooth(mapping = aes(y = agea, weight = pspwght))

25 / 32

Putting it all together:

ggplot(data = ESS, # Coordinate system, add ...
       mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) +  # define global aestetics, add ...
  geom_boxplot() + # a layer of boxplots, add
  geom_smooth(mapping = aes(color = cntry, group = cntry)) # Add smooth for each country.

26 / 32

When another layer of (important) information does not improve the plot.

ggplot(data = ESS, mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) +
  geom_boxplot() +
  geom_smooth(mapping = aes(group = 1)) +
  facet_wrap( ~ cntry, nrow = 1) # Make sub-plots by cntry.

27 / 32

Consider whether faceting helps to see the comparisons you are interested in!

ggplot(data = ESS, mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) +
  geom_boxplot() +
  geom_smooth(mapping = aes(group = 1)) +
  facet_grid(gndr ~ cntry) # Make sub-plots by gender (row) ~ country (column).

28 / 32

Save your plot

ggsave() allows you to save your plot as pdf, jpeg, png, tiff, svg, bmp, ps, eps. It will guess the type from the ending of the name you give the plot (e.g., "MyPlot.pdf").

# Make our plot and assign it to object my_plot.
my_plot <- ggplot(data = ESS, mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) +
  geom_boxplot() +
  geom_smooth(mapping = aes(group = 1)) +
  facet_grid(gndr ~ cntry)
# Save the plot into the working directory as pdf. It shall be 9 inches wide and 4.5 inches high.
ggsave(filename = "myplot1.pdf", plot = my_plot, width = 8, height = 3)

29 / 32

Save your plot

ggsave() allows you to save your plot as pdf, jpeg, png, tiff, svg, bmp, ps, eps. It will guess the type from the ending of the name you give the plot (e.g., "MyPlot.pdf").

# Make our plot and assign it to object my_plot.
my_plot <- ggplot(data = ESS, mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) +
  geom_boxplot() +
  geom_smooth(mapping = aes(group = 1)) +
  facet_grid(gndr ~ cntry)
# Save the plot into the working directory as pdf. It shall be 9 inches wide and 4.5 inches high.
ggsave(filename = "myplot1.pdf", plot = my_plot, width = 8, height = 3)

# Save the plot but with different margins.
ggsave(filename = "myplot2.pdf", plot = my_plot, width = 16, height = 9)

29 / 32

Save your plot

ggsave() allows you to save your plot as pdf, jpeg, png, tiff, svg, bmp, ps, eps. It will guess the type from the ending of the name you give the plot (e.g., "MyPlot.pdf").

# Make our plot and assign it to object my_plot.
my_plot <- ggplot(data = ESS, mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) +
  geom_boxplot() +
  geom_smooth(mapping = aes(group = 1)) +
  facet_grid(gndr ~ cntry)
# Save the plot into the working directory as pdf. It shall be 9 inches wide and 4.5 inches high.
ggsave(filename = "myplot1.pdf", plot = my_plot, width = 8, height = 3)

# Save the plot but with different margins.
ggsave(filename = "myplot2.pdf", plot = my_plot, width = 16, height = 9)

# Save the plot as jpeg, again different margins and very low resolution.
ggsave(filename = "myplot1.jpeg", plot = my_plot, width = 4.5, height = 9, dpi = 50)

29 / 32

PDFs do not need the dpi argument, because they are vector-based graphics.

30 / 32

Today's general lesson

# A general template
ggplot(data = <DATA>) +         # Create a coordinate system for <DATA>, and add "+"
  <GEOM_FUNCTION>(              # a layer of (geometric) information, which
     mapping = aes(<MAPPINGS>), # maps our data to aestetics, and
     stat = <STAT>,             # may depend on statistical transformations.
     position = <POSITION>      # Positioning may be adjusted.
  ) +
  <COORDINATE_FUNCTION> +       # Change the default coordinate system.
  <FACET_FUNCTION>              # Draw sub-plots by categorical variables.

Source: Wickham & Grolemund "R for Data Science"

31 / 32

Today's (important) functionstransmute(): similar to mutate, but only keeps the newly generated variables.
32 / 32

Intro to R for Social Data ScienceVisualization 1Merlin Schaeffer
 Department of Sociology2021-09-171 / 32

2 / 32

3 / 32

# Add packages to library
library(tidyverse) # Add the tidyverse package to my current library.
library(haven) # Read and handle SPSS, Stata & SAS data (no need to install)
library(essurvey) # Add ESS API package to library.
library(ggplot2) # Allows us to create nice figures.
# Import the ESS round 9 data via the API
ESS <- import_rounds(rounds = 9, ess_email = "YOUR-EMAIL", format = "spss")

4 / 32

# Add packages to library
library(tidyverse) # Add the tidyverse package to my current library.
library(haven) # Read and handle SPSS, Stata & SAS data (no need to install)
library(essurvey) # Add ESS API package to library.
library(ggplot2) # Allows us to create nice figures.
# Import the ESS round 9 data via the API
ESS <- import_rounds(rounds = 9, ess_email = "YOUR-EMAIL", format = "spss")

ESS <- transmute(ESS, # Recode several variables & keep only the recoded ones (i.e., transmute vs mutate).
                 idno = zap_labels(idno),
                 # Make the following variables factors:
                 cntry = as_factor(cntry), 
                 gndr = as_factor(gndr),
                 facntr = as_factor(facntr),
                 mocntr = as_factor(mocntr),
                 # Make the following variables numeric:
                 imbgeco = max(imbgeco, na.rm = TRUE) - zap_labels(imbgeco), # Also turn scale around.
                 imueclt = max(imueclt, na.rm = TRUE) - zap_labels(imueclt), # Also turn scale around.
                 imwbcnt = max(imwbcnt, na.rm = TRUE) - zap_labels(imwbcnt), # Also turn scale around.
                 agea = zap_labels(agea),
                 pspwght = zap_labels(pspwght),
                 eduyrs = case_when(
                   eduyrs > 21 ~ 21, # Recode to max 21 years of edu.
                   eduyrs < 9 ~ 9, # Recode to min 9 years of edu.
                   TRUE ~ zap_labels(eduyrs) # Make it numeric.
                 ),
)

4 / 32

# Case selection.
ESS <- dplyr::filter(ESS,
                     # Only respondents whose parents were born in country of interview.
                     facntr == "Yes" & mocntr == "Yes" &
                       # Only respondents from direct neighbors of Denmark:
                       (cntry == "Denmark" | cntry == "Germany" | cntry == "Sweden" | cntry == "Norway")
)
# Casewise deletion of missing values
(ESS <- drop_na(ESS))
# # A tibble: 5,354 × 11
#     idno cntry   gndr   facntr mocntr imbgeco imueclt imwbcnt  agea pspwght eduyrs
#    <dbl> <fct>   <fct>  <fct>  <fct>    <dbl>   <dbl>   <dbl> <dbl>   <dbl>  <dbl>
#  1    10 Germany Female Yes    Yes          0       5       5    65   0.854     11
#  2    64 Germany Female Yes    Yes          1       2       2    74   0.760     20
#  3    65 Germany Male   Yes    Yes          6       6       7    64   1.08      12
#  4    91 Germany Female Yes    Yes          2       2       2    54   1.27      14
#  5   150 Germany Female Yes    Yes          0       0       4    71   0.942     12
#  6   212 Germany Male   Yes    Yes          2       2       2    41   1.42      14
#  7   255 Germany Male   Yes    Yes          3       3       3    62   1.23      16
#  8   270 Germany Male   Yes    Yes          3       6       5    65   0.978     14
#  9   304 Germany Female Yes    Yes          1       1       5    47   0.984     13
# 10   311 Germany Male   Yes    Yes          0       0       5    67   0.535     18
# # … with 5,344 more rows

5 / 32

Why visualize? A simulated exampleWe are better in detecting visual patterns in figures compared to numeric patterns in tables.
You will reach wider audiences with figures than with tables.
You will understand your own data faster while exploring it.
# Multilevel mixed effects model.
lmer(data = sim_data,
  formula = xeno ~ education +
  (1 + education | Country)) %>% 
  stargazer(type = "text", style = "asr")
# 
# ---------------------------------
#                         xeno     
# ---------------------------------
# education             -0.799***  
# Constant                0.314    
# N                        500     
# Log Likelihood        -154.000   
# AIC                    320.000   
# BIC                    345.000   
# ---------------------------------
# *p < .05; **p < .01; ***p < .001

6 / 32

Why visualize? A simulated example

We are better in detecting visual patterns in figures compared to numeric patterns in tables.
You will reach wider audiences with figures than with tables.
You will understand your own data faster while exploring it.

# Multilevel mixed effects model.
lmer(data = sim_data,
  formula = xeno ~ education +
  (1 + education | Country)) %>% 
  stargazer(type = "text", style = "asr")
# 
# ---------------------------------
#                         xeno     
# ---------------------------------
# education             -0.799***  
# Constant                0.314    
# N                        500     
# Log Likelihood        -154.000   
# AIC                    320.000   
# BIC                    345.000   
# ---------------------------------
# *p < .05; **p < .01; ***p < .001

6 / 32

Why ggplot2? Because of its grammar of graphics

Independently specify the building blocks of a figure and combine them to create just about any kind of figure you want; its like Lego ;-).

7 / 32

The coordinate system

ggplot() # Create an empty coordinate system.

8 / 32

The coordinate system

ggplot(data = ESS) # Create an empty coordinate system for the ESS data.

9 / 32

Layers

ggplot(data = ESS) + # Add ...
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs)) # a "layer" of points (i.e., a scatter plot).

10 / 32

Layers

ggplot(data = ESS) +
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs))

11 / 32

The layered grammar of graphics

# A general template
ggplot(data = <DATA>) +         # Create a coordinate system for <DATA>, and add "+"
  <GEOM_FUNCTION>(              # a layer of (geometric) information, which
     mapping = aes(<MAPPINGS>), # maps our data to aestetics, and
     stat = <STAT>,             # may depend on statistical transformations.
     position = <POSITION>      # Positioning may be adjusted.
  ) +
  <COORDINATE_FUNCTION> +       # Change the default coordinate system.
  <FACET_FUNCTION>              # Draw sub-plots by categorical variables.

Source: Wickham & Grolemund "R for Data Science"

ggplot2 contains many geom functions, which put layers of different types of geometric objects (e.g., points, bars, lines) over a coordinate system.

12 / 32

The layered grammar of graphics

# A general template
ggplot(data = <DATA>) +         # Create a coordinate system for <DATA>, and add "+"
  <GEOM_FUNCTION>(              # a layer of (geometric) information, which
     mapping = aes(<MAPPINGS>), # maps our data to aestetics, and
     stat = <STAT>,             # may depend on statistical transformations.
     position = <POSITION>      # Positioning may be adjusted.
  ) +
  <COORDINATE_FUNCTION> +       # Change the default coordinate system.
  <FACET_FUNCTION>              # Draw sub-plots by categorical variables.

Source: Wickham & Grolemund "R for Data Science"

ggplot2 contains many geom functions, which put layers of different types of geometric objects (e.g., points, bars, lines) over a coordinate system.

All geom functions depend on the mapping argument. It is paired with aes(), which stands for "aestetic". Aestetics are the visual properties of your plot.
The most important aestetics of any graph are the y-axis and the x-axis. Therefore, aes() depends on x and y, because these specify which variable to map to the y-axis and which one to map to the x-axis.
But of course, aestetics also means, among others, color, shape, size, and so on.

12 / 32

Aestetics The visual properties of your plot

If you want to have an aestetic depend on the values of a variable, you need to specify it within aes().

ggplot(data = ESS) +
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs, color = cntry)) # Color by country.

13 / 32

ggplot2 will automatically assign a unique aesthetic (e.g., color/shape/size/etc.) to each value of the variable.
It will also generate a legend.

Aestetics

ggplot(data = ESS) +
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs, color = cntry, 
                           size = pspwght)) # Size by post-stratification weight.

14 / 32

You can manually control the aestetics, that is, which color and which sizes. But that is fine tuning. We want to explore our data right now.

Aestetics

Because R is object-oriented, aestetics behave differently depending on whether you give it a categorical or a continuous variable.

ggplot(data = ESS) +
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs, 
                           color = pspwght, size = cntry)) # Exchanged color and date aes.

15 / 32

Now color is gradual, rather than different colors.
For size, a categorical variable makes little sense.
Categorical are factor and character vectors.
continuous are numerical vectors.

Aestetics

If you want to define an aestetic irrespective of the values of any variable, you need to place outside the mapping argument.

ggplot(data = ESS) +
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs, 
                           size = pspwght, color = cntry), 
             shape = 21) # Use hollow circles.

16 / 32

alpha adds transparency, which varies between 0 (see through) and 1 nontransparent.
You need to give that aestetic a value that makes sense to it.

Geometric objects As what do you visualize your data?

How are these two plots similar?

ggplot(data = ESS) + 
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs))

ggplot(data = ESS) +
  geom_smooth(mapping = aes(y = imwbcnt, x = eduyrs))

17 / 32

They show the same data, but expressed as different geometric objects.
ggplot2 contains +30 geoms. Extension packages contain even more.

Geometric objects As what do you visualize your data?

ggplot(data = ESS) + 
  geom_boxplot(mapping = aes(y = imwbcnt, 
                             x = factor(eduyrs)))

Source: Wikipedia

18 / 32

Geoms & weights Apply or visualize, it depends on the geom ...

ggplot(data = ESS) + 
  geom_point(aes(y = imwbcnt, x = eduyrs, 
                 size = pspwght)) # Visualize

ggplot(data = ESS) +
  geom_smooth(aes(y = imwbcnt, x = eduyrs, 
                  weight = pspwght)) # Apply

19 / 32

Geoms & aestetics Some aestetics are geom specific

ggplot(data = ESS) + 
  geom_point(aes(y = imwbcnt, x=eduyrs, color=cntry,
                 size = pspwght, 
                 shape = cntry))

ggplot(data = ESS) +
  geom_smooth(aes(y=imwbcnt, x=eduyrs, color=cntry, 
                  weight = pspwght, 
                  linetype = cntry))

20 / 32

We can use the color aestetic in both plots.
We cannot use shape for lines and line types for points.
Note that ggplot2 automatically groups data for geoms whenever you map an aesthetic to a categorical variable!

Multiple geoms Are layered on top of each other

To have several geoms in one plot, simply add + them on top of each other.

ggplot(data = ESS) + # Coordinate system, add ...
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs, size = pspwght)) + # layer of points, add ...
  geom_smooth(mapping = aes(y = imwbcnt, x = eduyrs, weight = pspwght)) # layer of smoothed average & 95%-CI.

21 / 32

Multiple geoms

The order of geoms matters, ggplot2 adds layer on top of layer.

ggplot(data = ESS) + # Coordinate system, add ...
  geom_smooth(mapping = aes(y = imwbcnt, x = eduyrs, weight = pspwght)) +
  geom_point(mapping = aes(y = imwbcnt, x = eduyrs, size = pspwght))

22 / 32

Global aestetics

To avoid repetitive code, we can specify global aestetics, which (by default) hold for all geoms.

ggplot(data = ESS, mapping = aes(y = imwbcnt, x = eduyrs)) + # Coord. system with global aestetics, add ...
  geom_point() + # a layer of points, add ...
  geom_smooth() # a layer with a line of locally-smoothed averages and CI.

23 / 32

By the way, this is a nice example why graphics are great for data exploration:

ggplot(data = ESS, mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) + # Coord. system with global aestetics, add ...
  geom_boxplot() + # a layer of boxplots, add ...
  # For some reason, geom_smooth needs the "aes(group = 1)" argument.
  geom_smooth(mapping = aes(group = 1), se = FALSE) + # No CI (i.e., confidence interval), add ...
  geom_smooth(mapping = aes(group = 1), method = "lm", se = FALSE, color = "red") # an OLS line.

24 / 32

Local aestetics For the single geoms

Beware, local aesthetics override the global (default) aestetics!

ggplot(data = ESS, mapping = aes(y = imwbcnt, x = eduyrs)) +
  geom_point(mapping = aes(color = cntry, size = pspwght), alpha = 0.2) + # aes() for geom_point exclusively.
  geom_smooth(mapping = aes(y = agea, weight = pspwght))

25 / 32

Putting it all together:

ggplot(data = ESS, # Coordinate system, add ...
       mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) +  # define global aestetics, add ...
  geom_boxplot() + # a layer of boxplots, add
  geom_smooth(mapping = aes(color = cntry, group = cntry)) # Add smooth for each country.

26 / 32

When another layer of (important) information does not improve the plot.

ggplot(data = ESS, mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) +
  geom_boxplot() +
  geom_smooth(mapping = aes(group = 1)) +
  facet_wrap( ~ cntry, nrow = 1) # Make sub-plots by cntry.

27 / 32

Consider whether faceting helps to see the comparisons you are interested in!

ggplot(data = ESS, mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) +
  geom_boxplot() +
  geom_smooth(mapping = aes(group = 1)) +
  facet_grid(gndr ~ cntry) # Make sub-plots by gender (row) ~ country (column).

28 / 32

Save your plot

ggsave() allows you to save your plot as pdf, jpeg, png, tiff, svg, bmp, ps, eps. It will guess the type from the ending of the name you give the plot (e.g., "MyPlot.pdf").

# Make our plot and assign it to object my_plot.
my_plot <- ggplot(data = ESS, mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) +
  geom_boxplot() +
  geom_smooth(mapping = aes(group = 1)) +
  facet_grid(gndr ~ cntry)
# Save the plot into the working directory as pdf. It shall be 9 inches wide and 4.5 inches high.
ggsave(filename = "myplot1.pdf", plot = my_plot, width = 8, height = 3)

29 / 32

Save your plot

ggsave() allows you to save your plot as pdf, jpeg, png, tiff, svg, bmp, ps, eps. It will guess the type from the ending of the name you give the plot (e.g., "MyPlot.pdf").

# Make our plot and assign it to object my_plot.
my_plot <- ggplot(data = ESS, mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) +
  geom_boxplot() +
  geom_smooth(mapping = aes(group = 1)) +
  facet_grid(gndr ~ cntry)
# Save the plot into the working directory as pdf. It shall be 9 inches wide and 4.5 inches high.
ggsave(filename = "myplot1.pdf", plot = my_plot, width = 8, height = 3)

# Save the plot but with different margins.
ggsave(filename = "myplot2.pdf", plot = my_plot, width = 16, height = 9)

29 / 32

Save your plot

ggsave() allows you to save your plot as pdf, jpeg, png, tiff, svg, bmp, ps, eps. It will guess the type from the ending of the name you give the plot (e.g., "MyPlot.pdf").

# Make our plot and assign it to object my_plot.
my_plot <- ggplot(data = ESS, mapping = aes(y = imwbcnt, x = factor(eduyrs), weight = pspwght)) +
  geom_boxplot() +
  geom_smooth(mapping = aes(group = 1)) +
  facet_grid(gndr ~ cntry)
# Save the plot into the working directory as pdf. It shall be 9 inches wide and 4.5 inches high.
ggsave(filename = "myplot1.pdf", plot = my_plot, width = 8, height = 3)

# Save the plot but with different margins.
ggsave(filename = "myplot2.pdf", plot = my_plot, width = 16, height = 9)

# Save the plot as jpeg, again different margins and very low resolution.
ggsave(filename = "myplot1.jpeg", plot = my_plot, width = 4.5, height = 9, dpi = 50)

29 / 32

PDFs do not need the dpi argument, because they are vector-based graphics.

30 / 32

Today's general lesson

# A general template
ggplot(data = <DATA>) +         # Create a coordinate system for <DATA>, and add "+"
  <GEOM_FUNCTION>(              # a layer of (geometric) information, which
     mapping = aes(<MAPPINGS>), # maps our data to aestetics, and
     stat = <STAT>,             # may depend on statistical transformations.
     position = <POSITION>      # Positioning may be adjusted.
  ) +
  <COORDINATE_FUNCTION> +       # Change the default coordinate system.
  <FACET_FUNCTION>              # Draw sub-plots by categorical variables.

Source: Wickham & Grolemund "R for Data Science"

31 / 32

Today's (important) functionstransmute(): similar to mutate, but only keeps the newly generated variables.
32 / 32

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help
o	Tile View: Overview of Slides

Intro to R for Social Data Science

Visualization 1

Merlin Schaeffer Department of Sociology

2021-09-17

Why visualize? A simulated example

Why visualize? A simulated example

Why ggplot2? Because of its grammar of graphics

The coordinate system

The coordinate system

Layers

Layers

The layered grammar of graphics

The layered grammar of graphics

Aestetics The visual properties of your plot

Aestetics

Aestetics

Aestetics

Geometric objects As what do you visualize your data?

Geometric objects As what do you visualize your data?

Geoms & weights Apply or visualize, it depends on the geom ...

Geoms & aestetics Some aestetics are geom specific

Multiple geoms Are layered on top of each other

Multiple geoms

Global aestetics

Local aestetics For the single geoms

Putting it all together:

Facets Sub-plots by categorical type

Facet grid A cross-table of plots

Save your plot

Save your plot

Save your plot

Today's general lesson

Today's (important) functions

Help

Intro to R for Social Data Science

Intro to R for Social Data Science

Visualization 1

Merlin Schaeffer Department of Sociology

2021-09-17

Why visualize? A simulated example

Why visualize? A simulated example

Why ggplot2? Because of its grammar of graphics

The coordinate system

The coordinate system

Layers

Layers

The layered grammar of graphics

The layered grammar of graphics

Aestetics The visual properties of your plot

Aestetics

Aestetics

Aestetics

Geometric objects As what do you visualize your data?

Geometric objects As what do you visualize your data?

Geoms & weights Apply or visualize, it depends on the geom ...

Geoms & aestetics Some aestetics are geom specific

Multiple geoms Are layered on top of each other

Multiple geoms

Global aestetics

Local aestetics For the single geoms

Putting it all together:

Facets Sub-plots by categorical type

Facet grid A cross-table of plots

Save your plot

Save your plot

Save your plot

Today's general lesson

Today's (important) functions

Merlin Schaeffer
Department of Sociology

Merlin Schaeffer
Department of Sociology