Home > front end >  R counting across columns and group_by the first column
R counting across columns and group_by the first column

Time:05-06

I have a data file which looks like the following:

Country Medicine1 Medicine2 Medicine3 Medicine4
CountryA Aspirin  Nyquil    Geritol
CountryA Aspirin  Geritol
CountryA Nyquil   Aspirin   Tylenol   Viagra
CountryB Tylenol  Nyquil    Viagra
CountryB Aspirin  Tylenol
...
CountryZ Fungisil Loprox    Tylenol
CountryZ Aspirin  Geritol   Loprox    Tylenol

What I would like to do is to get an output that would count for each country, the sum of the individual drug names across the columns. Such as:

Country Aspirin Fungisil Geritol Loprox Nyquil Tylenol Viagra
CountryA     3        0       2      0      2       1      1
CountryB     1        0       0      0      1       2      1
...
CountryZ     1        1       1      2      0       2      0

CodePudding user response:

Here is one potential solution using tidyverse functions:

library(tidyverse)

df <- read.table(text = "Country Medicine1 Medicine2 Medicine3 Medicine4
CountryA Aspirin Nyquil Geritol
CountryA Aspirin Geritol
CountryA Nyquil Aspirin Tylenol Viagra
CountryB Tylenol Nyquil Viagra
CountryB Aspirin Tylenol
CountryZ Fungisil Loprox Tylenol
CountryZ Aspirin Geritol Loprox Tylenol", header = TRUE, fill = NA)

df %>%
  pivot_longer(-Country) %>%
  filter(value != "") %>%
  arrange(value) %>%
  pivot_wider(id_cols = Country, names_from = value,
              values_from = name, values_fn = length,
              values_fill = 0)
#> # A tibble: 3 × 8
#>   Country  Aspirin Fungisil Geritol Loprox Nyquil Tylenol Viagra
#>   <chr>      <int>    <int>   <int>  <int>  <int>   <int>  <int>
#> 1 CountryA       3        0       2      0      2       1      1
#> 2 CountryB       1        0       0      0      1       2      1
#> 3 CountryZ       1        1       1      2      0       2      0

Created on 2022-05-06 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related