Home > Software engineering >  horizontal and vertical join count in r dataframe
horizontal and vertical join count in r dataframe

Time:11-30

having a dataframe with sales per customer and months.

df <-
data.frame(
  stringsAsFactors = FALSE,
              date = c("jan","jan","jan","jan",
                       "jan","jan","jan","feb","feb","feb","feb","feb",
                       "feb","feb"),
          customer = c("john","john","john","Mary",
                       "Mary","Mary","Mary","Robert","Robert","Mary",
                       "john","john","Robert","Robert"),
           product = c("a","b","d","a","b","c",
                       "d","a","b","c","a","c","c","d")

   date customer product
1   jan     john       a
2   jan     john       b
3   jan     john       d
4   jan     Mary       a
5   jan     Mary       b
6   jan     Mary       c
7   jan     Mary       d
8   feb   Robert       a
9   feb   Robert       b
10  feb     Mary       c
11  feb     john       a
12  feb     john       c
13  feb   Robert       c
14  feb   Robert       d

I need to summarize how many times the same customer is present across months and products.

Expected result:

 date          a       b       c       d    same cust
  jan          2       2       1       2         0
  feb          2       1       2       0         1
  same cust    1       0       1       0    

CodePudding user response:

I don't know about the marginals, but for the main table

library(reshape2)
dcast(
  df,
  date~product,
  function(x){length(unique(x))},
  value.var="customer"
)

  date a b c d
1  feb 2 1 3 1
2  jan 2 2 1 2

CodePudding user response:

You can try

library(tidyverse)
df %>% 
 pivot_wider(names_from = product, values_from = customer, values_fn = n_distinct) %>% 
 bind_rows(
  df %>% 
    count(product, customer) %>%
    group_by(product) %>% 
    summarise(n=sum(n-1), 
              date = "all") %>% 
    pivot_wider(names_from = product,values_from=n ))
# A tibble: 3 x 5
  date      a     b     d     c
  <chr> <dbl> <dbl> <dbl> <dbl>
1 jan       2     2     2     1
2 feb       2     1     1     3
3 all       1     0     0     1
  •  Tags:  
  • r
  • Related