Home > Blockchain >  Data manipulation in R: Column divided by the number of rows that contains the corresponding value
Data manipulation in R: Column divided by the number of rows that contains the corresponding value

Time:10-18

A = c(10009, 10009,  10009, 10009,  10011, 10011, ...)
B = c(23908, 230908, 230908,230908, 23514, 23514, ...)

I have a dataframe with the above 2 columns. How do I create a third column, C, that is B divided by the number of rows that contains the corresponding value in column A?

I tried the below but the error is: "problem with mutate(), column C".

DF = DF %>%
   group_by(A) %>%
   mutate(C = B/n(A))

CodePudding user response:

n() doesn't accept any arguments. Try -

library(dplyr)

DF <- DF %>% group_by(A) %>% mutate(C = B/n()) %>% ungroup

CodePudding user response:

You meant length:

DF %>%
   group_by(A) %>%
   mutate(C = B / length(A))

Result on your example dataset:

      A      B     C
  <dbl>  <dbl> <dbl>
1 10009  23908  5977
2 10009 230908 57727
3 10009 230908 57727
4 10009 230908 57727
5 10011  23514 11757
6 10011  23514 11757

CodePudding user response:

Update: A longer version (maybe not the best) for your task might be first to use add_count and then mutate: With this longer version you can follow the steps:

library(dplyr)
df %>%
  group_by(A) %>%
  add_count() %>% 
  mutate(C = B/n) %>% 
  ungroup() %>% 
  select(-n)

output:

      A      B     C
  <dbl>  <dbl> <dbl>
1 10009  23908  5977
2 10009 230908 57727
3 10009 230908 57727
4 10009 230908 57727
5 10011  23514 11757
6 10011  23514 11757

First answer some seconds behind Ronak Shah!

library(dplyr)
df %>%
  group_by(A) %>%
  mutate(C = B/n())

CodePudding user response:

Using data.table

library(data.table)
setDT(DF)[, C := B/.N, A]
  • Related