My data is formatted as follows, with the data being a character type, not a date type:
X date
1 19460530
0 19460601
1 19460602
1 19460603
. ...
. ...
. ...
What I would like to get is the ratio of X on a monthly basis. For example, if I have 20 1s and 30 0s for July 1946 and 40 1s and 40 0s in August of 1946, I would like the following output:
194607 0.4
194608 0.5
From such an output, I would like to put it into a line graph using ggplot2 (date x ratio of X). Because in geom_line, you should have a continuous variable, and if I used a format like 194607 or 194608, there would be a huge gap between December and January. How can I make a line graph using monthly data?
CodePudding user response:
ggplot is flexible to handle date objects on the x-axis without that 'jump/gap' you are worried about.
tribble(
~X, ~date,
1, 19460530,
0, 19460601,
1, 19460602,
1, 19460603
) -> df
df$date <- lubridate::ymd(df$date)
df %>%
group_by(date) %>%
mutate(proportion = X / sum(X)) -> df
ggplot(df, aes(x = date, y = proportion))
geom_line()
CodePudding user response:
hachiko, thanks for your prompt answer. I have two questions, though.
You only group_by'd by date, but how can you aggregate by month? Don't you have to specify by month somewhere?
Is "proportion = X / sum(X)) -> df" right? Summing X will count the number of 1s, then shouldn't it be in the numerator?