Trying to plot the following using a open dataset from this:
For clarity purposes I'm trying to plot the following information:
Geography: Canada
Type.of.work: Full-time
y= Both Sexes
Group: Education.level
Any pointers would be helpful thanks!
CodePudding user response:
This produces more than a grey square, however, I doubt it is actually what you are after:
library(tidyverse)
wages = read_csv('https://data.ontario.ca/dataset/1f14addd-e4fc-4a07-9982-ad98db07ef86/resource/7b325fa1-e9d6-4329-a501-08cdc22a79df/download/v0913_05.csv')
wages |>
subset(Geography == "Canada" & `Type of work` == "Full-time") |>
ggplot( aes(x=YEAR, y=`Both Sexes`, group=`Education level`, colour = `Education level`))
geom_line(alpha=0.6 , size=.5)
labs(title = "Wages for Full-Time employees by Education Level in Canada")
There are multiple lines for "Education level" for each year so some sort of summary and/or further grouping is needed, for example:
wages %>%
filter(Geography == 'Canada' & `Type of work` == 'Full-time') %>%
group_by(YEAR, `Education level`, `Age group`) %>%
summarise(`Both Sexes`= sum(`Both Sexes`, na.rm = TRUE)) %>%
ggplot(aes(x = YEAR, y = `Both Sexes`, group = `Education level`, colour = `Education level`))
geom_line()
facet_wrap(~`Age group`)
CodePudding user response:
I think you are looking for this:
There are some things to consider:
- In the original data set there are blank spaces within for example
Type.of.work
orEducation.level
usestr_trim
to remove. (This may the reason for the first blank plot! - The reason why your curves are zig-zag is that for example
YEAR
shares more same datapoints inBoth.sexes
. So you have more then one value in for example 1997. I suggest to get the mean and then plot.
library(tidyverse)
df %>%
select(YEAR, Both.Sexes, Geography, Type.of.work, Education.level) %>%
as_tibble() %>%
filter(Geography == "Canada" & str_trim(Type.of.work) =="Full-time") %>%
mutate(Education.level = str_trim(Education.level)) %>%
group_by(YEAR, Education.level) %>%
summarise(meanBothSexes = mean(Both.Sexes, na.rm=TRUE)) %>%
ggplot( aes(x=YEAR, y=meanBothSexes, group= Education.level, color=Education.level))
geom_line(alpha=0.6 , size=.5)
labs(title = "Wages for Full-Time employees by Education Level in Canada")