Within my data I have a subset of data that look like this:
Incident | Year | Person1 |Person2|
:---- |:---: |:------: | -----:|
1| 2014 | A | B |
2| 2014 | A | |
3| 2016 | B | C |
4| 2018 | A | C |
5| 2020 | C | |
My goal is to count frequencies for each person and graph the data in a barchart (x axis=year, y=frequency, persons grouped). I could do count(data$Person1) and count(data$Person2) and sum them "by hand" to count frequencies, but I will need the data to be combined for the bar chart. I don't think I would use concatenate because I don't want AB, AC, etc.
I tried
group_data<-group_by(.data=data,Incident,Person1,Person2)
sum_counts<-dplyr::summarise(group_data,total.count=n())
View(sum_counts)
But the output is essentially the table above.
Is there an elegant way to group columns Person1 and Person2 without muddling the number of incidents (each row = a distinct incident)? Or in order to make my desired barchart, do I need to totally restructure the data?
Thanks in advance for wisdom.
CodePudding user response:
You do have to restructure your data, but it's easy -- just a quick pivot_longer()
.
library(tidyverse)
data %>%
pivot_longer(Person1:Person2, values_to = "Person", values_drop_na = TRUE) %>%
ggplot()
geom_bar(aes(Year, fill = Person))
CodePudding user response:
I think you are trying to do something like this (edited based on OP's clarification that column names start with pod
, rather than Person
:
dat %>%
pivot_longer(cols=starts_with("pod"),values_to = "pod") %>%
filter(!is.na(pod)) %>%
count(pod,Year)
Output:
# A tibble: 7 x 3
pod Year n
<chr> <dbl> <int>
1 A 2014 2
2 A 2018 1
3 B 2014 1
4 B 2016 1
5 C 2016 1
6 C 2018 1
7 C 2020 1
Input:
dat = tibble(
Incident=c(1,2,3,4,5),
Year =c(2014,2014,2016,2018,2020),
pod1 = c("A","A","B","A","C"),
pod2 = c("B",NA,"C","C",NA)
)