I have a data file which looks like the following:
Country Medicine1 Medicine2 Medicine3 Medicine4
CountryA Aspirin Nyquil Geritol
CountryA Aspirin Geritol
CountryA Nyquil Aspirin Tylenol Viagra
CountryB Tylenol Nyquil Viagra
CountryB Aspirin Tylenol
...
CountryZ Fungisil Loprox Tylenol
CountryZ Aspirin Geritol Loprox Tylenol
What I would like to do is to get an output that would count for each country, the sum of the individual drug names across the columns. Such as:
Country Aspirin Fungisil Geritol Loprox Nyquil Tylenol Viagra
CountryA 3 0 2 0 2 1 1
CountryB 1 0 0 0 1 2 1
...
CountryZ 1 1 1 2 0 2 0
CodePudding user response:
Here is one potential solution using tidyverse functions:
library(tidyverse)
df <- read.table(text = "Country Medicine1 Medicine2 Medicine3 Medicine4
CountryA Aspirin Nyquil Geritol
CountryA Aspirin Geritol
CountryA Nyquil Aspirin Tylenol Viagra
CountryB Tylenol Nyquil Viagra
CountryB Aspirin Tylenol
CountryZ Fungisil Loprox Tylenol
CountryZ Aspirin Geritol Loprox Tylenol", header = TRUE, fill = NA)
df %>%
pivot_longer(-Country) %>%
filter(value != "") %>%
arrange(value) %>%
pivot_wider(id_cols = Country, names_from = value,
values_from = name, values_fn = length,
values_fill = 0)
#> # A tibble: 3 × 8
#> Country Aspirin Fungisil Geritol Loprox Nyquil Tylenol Viagra
#> <chr> <int> <int> <int> <int> <int> <int> <int>
#> 1 CountryA 3 0 2 0 2 1 1
#> 2 CountryB 1 0 0 0 1 2 1
#> 3 CountryZ 1 1 1 2 0 2 0
Created on 2022-05-06 by the reprex package (v2.0.1)