Home > OS >  Graphing multiple variables, plot shows grey?
Graphing multiple variables, plot shows grey?

Time:11-28

Trying to plot the following using a open dataset from this: enter image description here

For clarity purposes I'm trying to plot the following information:

Geography: Canada

Type.of.work: Full-time

y= Both Sexes

Group: Education.level

Any pointers would be helpful thanks!

CodePudding user response:

This produces more than a grey square, however, I doubt it is actually what you are after:

library(tidyverse)

wages = read_csv('https://data.ontario.ca/dataset/1f14addd-e4fc-4a07-9982-ad98db07ef86/resource/7b325fa1-e9d6-4329-a501-08cdc22a79df/download/v0913_05.csv')

wages |>
  subset(Geography == "Canada" & `Type of work` == "Full-time") |>
  ggplot( aes(x=YEAR, y=`Both Sexes`, group=`Education level`, colour = `Education level`))   
  geom_line(alpha=0.6 , size=.5)  
  labs(title = "Wages for Full-Time employees by Education Level in Canada")

plot 1 There are multiple lines for "Education level" for each year so some sort of summary and/or further grouping is needed, for example:

wages %>% 
  filter(Geography == 'Canada' & `Type of work` == 'Full-time') %>% 
  group_by(YEAR, `Education level`, `Age group`) %>% 
  summarise(`Both Sexes`= sum(`Both Sexes`, na.rm = TRUE)) %>% 
  ggplot(aes(x = YEAR, y = `Both Sexes`, group = `Education level`, colour = `Education level`))  
  geom_line()  
  facet_wrap(~`Age group`)

plot 2

CodePudding user response:

I think you are looking for this:

There are some things to consider:

  1. In the original data set there are blank spaces within for example Type.of.work or Education.level use str_trim to remove. (This may the reason for the first blank plot!
  2. The reason why your curves are zig-zag is that for example YEAR shares more same datapoints in Both.sexes. So you have more then one value in for example 1997. I suggest to get the mean and then plot.
library(tidyverse)

df %>% 
  select(YEAR, Both.Sexes, Geography, Type.of.work, Education.level) %>% 
  as_tibble() %>% 
  filter(Geography == "Canada" & str_trim(Type.of.work) =="Full-time") %>% 
  mutate(Education.level = str_trim(Education.level)) %>%  
  group_by(YEAR, Education.level) %>% 
  summarise(meanBothSexes = mean(Both.Sexes, na.rm=TRUE)) %>% 
  ggplot( aes(x=YEAR, y=meanBothSexes, group= Education.level, color=Education.level))   
  geom_line(alpha=0.6 , size=.5)  
  labs(title = "Wages for Full-Time employees by Education Level in Canada")

enter image description here

  •  Tags:  
  • r
  • Related