I have this type of database in an Excel file
UGA Date Product
A 05/2018 AM
Z. 06/2019 BX
X. 01/2020 AM
M. 02/2017 AM
B. 03/2019 AM/BX
M. 04/2018 BX
The variable Product can take three types of value: AM, BX or AM/BX
And I would like to get this type of table by using R code
UGA Date Nb_AM Nb_BX
A 01/2017 4 14
A 02/2017 5 17
A 03/2017 2 24
B 01/2017 3 35
B 02/2017 10 42
B 03/2017 24 2
I want to aggregate the data by UGA and by date and create two new columns Nb_AM and Nb_BX and I don't want to display the column "product" anymore
With Nb_AM corresponding to the number of apparition of "AM" and "AM/BX" for a date and an UGA
And Nb_BX corresponding to the number of "BX" and "AM/BX" for a date and an UGA
I think I should use the aggregate command but I have no idea how to use it Could you please help me?
Thank you very much!
CodePudding user response:
I have modified your sample data a bit to include 2 values in the same month. Let me know if this works. If you already have the data read in with readxl or alternatively read_csv, skip everything separate
and earlier.
library(tidyverse)
read_delim(" UGA Date Product
A 05/2018 AM
A. 05/2018 AM/BX
Z. 06/2019 BX
X. 01/2020 AM
M. 02/2017 AM
B. 03/2005 AM/BX
M. 04/20189 BX", delim = "\\n") %>%
mutate(across(everything(), str_trim)) %>%
separate(1, into = c("UGA", "Date", "Product"), sep = " ") %>%
mutate(UGA = str_remove(UGA, "\\.")) %>%
group_by(UGA, Date, Product) %>%
summarise(count = n()) %>%
pivot_wider(names_from = Product, values_from = count, values_fill = 0) %>%
mutate(Nb_AM = AM `AM/BX`,
Nb_BX = BX `AM/BX`) %>%
ungroup() %>%
select(UGA, Date, starts_with("Nb"))
CodePudding user response:
Also i Have this error
Error in select(., UGA, Date, starts_with("Nb")) : unused arguments (UGA, Date, starts_with("Nb"))