I'm trying to replicate this figure with my own data using ggplot2.
In my case, I don't care about the colors and legend at the moment, I just need to show the number of new repositories during a one year period and the cdf (Cumulative distribution function) of the creation date (with a dual y axis, positioning the 1.0 value in the upper right corner, as in the example)
So far I got this code:
repo.count <- read.csv("http://pastebin.com/raw.php?i=PfUgyrt0",sep=",")
repo.count$createdAt <- as.Date(repo.count$createdAt)
ggplot(data = repo.count, aes(x = createdAt))
geom_histogram(colour = 1, fill = "white", position="identity")
xlab('Date')
ylab('# New Repositories')
ggtitle('')
scale_x_date(labels = scales::date_format("%Y-%b"),
breaks = by_month(date.groups$createdAt,2))
scale_y_continuous(sec.axis = sec_axis(~. / max(repo.count$total),
name = "Cumulative distribution"))
theme(axis.text.x = element_text(angle=90)) stat_ecdf(size=1)
But it's far from what I'd like to achieve:
CodePudding user response:
I wonder if you are perhaps not looking for a histogram as such, but just a bar plot of new repos per day and a line indicating cumulative repos? If so, use geom_col
instead of geom_histogram
, and specify the y variable as repositories
ggplot(data = repo.count, aes(x = createdAt, y = repositories))
geom_col(fill = "green4", color = NA)
geom_step(aes(y = total / 100))
xlab('Date')
ylab('# New Repositories')
ggtitle('')
scale_x_date(labels = scales::date_format("%Y-%b"),
date_breaks = 'month')
scale_y_continuous(sec.axis = sec_axis(~. * 100,
name = "Cumulative repos"))
theme_classic(base_size = 16)
theme(axis.text.x = element_text(angle = 90)