Home > front end >  In R, plot the average number of unique events per year
In R, plot the average number of unique events per year

Time:10-02

Background

I've got an R dataframe d:

d <- data.frame(ID = c("a","a","a","b","b","c","c","d"),
                event = c("R2","R2","O99","B4","B4","A24","A24","J15"),
                service_date = as.Date(c("2011-01-01","2011-07-12","2013-12-23","2011-04-27","2012-01-01","2012-07-12","2012-12-23","2013-04-27")),
                stringsAsFactors=FALSE) %>% arrange(service_date, ID)

It looks like this (I've arranged it by service_date and ID so it's easier to look at):

d

The Problem

I'm trying to visualize, using geom_bar, how many distinct (i.e. non-repeating or non-duplicated) events there are in this dataframe in each year, but I'm having a little trouble.

Desired Result

To give a better idea of what I want, take a look at the first year in d, which is 2011. You see 3 rows, but only 2 distinct events, R2 and B4, because R2 is repeated once in row 3. In the bar chart, then, I'd like the first bar to represent 2011 and show 2.

If you do the counts for the other 2 years, you get the same result each time. So what I want is a bar chart with 3 bars representing 2011, 2012, and 2013, each with a height of 2, representing the count of unique events in the dataframe across all IDs.

EDIT I'm seeing that I wasn't clear in my description of the desired result. Apologies to readers! Here's a crude visual depiction done in Excel:

desired

What I've Tried

The code I'm trying to fiddle with is something like this:

ggplot(data = d)   geom_bar(aes(x = service_date, y = n_distinct(event)), stat = "identity")

But that's getting me an odd result:

enter image description here

I'm not married to ggplot for this, any plotting function will do.

CodePudding user response:

We can extract the year from the Date class 'service_date' column, create the new column 'n' with number of distinct elements from 'event', and then do the barplot

library(dplyr)
library(ggplot2)
library(lubridate)
d %>% 
   group_by(year = year(service_date))  %>% 
   summarise(n = n_distinct(event)) %>% 
   ggplot(aes(x = year, y = n))   
     geom_col()

-output

enter image description here

The OP used n_distinct on the entire column 'event' and not grouped by 'ID', thus it is returning a single value

  • Related