Home > OS >  R : Aggregate data by two variables and count the number of times the value of a third variable appe
R : Aggregate data by two variables and count the number of times the value of a third variable appe

Time:07-22

I have this type of database in an Excel file

   UGA     Date     Product
  A      05/2018   AM
  Z.     06/2019   BX
  X.     01/2020   AM
  M.     02/2017   AM
  B.     03/2019   AM/BX
  M.     04/2018   BX

The variable Product can take three types of value: AM, BX or AM/BX

And I would like to get this type of table by using R code

  UGA   Date      Nb_AM   Nb_BX
  A     01/2017   4       14
  A     02/2017   5       17
  A     03/2017   2       24
  B     01/2017   3       35
  B     02/2017   10      42
  B     03/2017   24      2

I want to aggregate the data by UGA and by date and create two new columns Nb_AM and Nb_BX and I don't want to display the column "product" anymore

With Nb_AM corresponding to the number of apparition of "AM" and "AM/BX" for a date and an UGA

And Nb_BX corresponding to the number of "BX" and "AM/BX" for a date and an UGA

I think I should use the aggregate command but I have no idea how to use it Could you please help me?

Thank you very much!

CodePudding user response:

I have modified your sample data a bit to include 2 values in the same month. Let me know if this works. If you already have the data read in with readxl or alternatively read_csv, skip everything separate and earlier.

library(tidyverse)

read_delim("   UGA     Date     Product
  A      05/2018   AM
  A.     05/2018   AM/BX
  Z.     06/2019   BX
  X.     01/2020   AM
  M.     02/2017   AM
  B.     03/2005   AM/BX
  M.     04/20189  BX", delim = "\\n") %>%
  mutate(across(everything(), str_trim)) %>%
  separate(1, into = c("UGA", "Date", "Product"), sep = "  ")  %>%
  mutate(UGA = str_remove(UGA, "\\.")) %>%
  group_by(UGA, Date, Product) %>%
  summarise(count = n()) %>%
  pivot_wider(names_from = Product, values_from = count, values_fill = 0) %>%
  mutate(Nb_AM = AM   `AM/BX`,
         Nb_BX = BX   `AM/BX`) %>%
  ungroup() %>%
  select(UGA, Date, starts_with("Nb"))

CodePudding user response:

Also i Have this error

Error in select(., UGA, Date, starts_with("Nb")) : unused arguments (UGA, Date, starts_with("Nb"))

  •  Tags:  
  • r
  • Related