Home > other >  How to unite multiple columns (character data) without concatenating?
How to unite multiple columns (character data) without concatenating?

Time:02-25

Within my data I have a subset of data that look like this:

Incident | Year | Person1  |Person2|
  :----  |:---: |:------:  | -----:|
        1| 2014 | A        | B     |
        2| 2014 | A        |       |
        3| 2016 | B        | C     |
        4| 2018 | A        | C     |
        5| 2020 | C        |       |

My goal is to count frequencies for each person and graph the data in a barchart (x axis=year, y=frequency, persons grouped). I could do count(data$Person1) and count(data$Person2) and sum them "by hand" to count frequencies, but I will need the data to be combined for the bar chart. I don't think I would use concatenate because I don't want AB, AC, etc.

I tried

group_data<-group_by(.data=data,Incident,Person1,Person2)
sum_counts<-dplyr::summarise(group_data,total.count=n())
View(sum_counts)

But the output is essentially the table above.

Is there an elegant way to group columns Person1 and Person2 without muddling the number of incidents (each row = a distinct incident)? Or in order to make my desired barchart, do I need to totally restructure the data?

Thanks in advance for wisdom.

CodePudding user response:

You do have to restructure your data, but it's easy -- just a quick pivot_longer().

library(tidyverse)

data %>%
  pivot_longer(Person1:Person2, values_to = "Person", values_drop_na = TRUE) %>%
  ggplot()  
  geom_bar(aes(Year, fill = Person))

CodePudding user response:

I think you are trying to do something like this (edited based on OP's clarification that column names start with pod, rather than Person:

dat %>% 
  pivot_longer(cols=starts_with("pod"),values_to = "pod") %>% 
  filter(!is.na(pod)) %>% 
  count(pod,Year)

Output:

# A tibble: 7 x 3
  pod    Year     n
  <chr> <dbl> <int>
1 A      2014     2
2 A      2018     1
3 B      2014     1
4 B      2016     1
5 C      2016     1
6 C      2018     1
7 C      2020     1

Input:

dat = tibble(
    Incident=c(1,2,3,4,5),
    Year =c(2014,2014,2016,2018,2020),
    pod1 = c("A","A","B","A","C"),
    pod2 = c("B",NA,"C","C",NA)
)
  • Related