Home > Blockchain >  Assign a number in a dataframe in R
Assign a number in a dataframe in R

Time:06-24

I need to create a flagging rule. Here is a sample dataset.

df1 <- data.frame(problem = c(1,1,2,3,3,3,4),
                  item = c("a1","a2","a3","b1","b2","b11","c1"))

> df1
  problem item
1       1   a1
2       1   a2
3       2   a3
4       3   b1
5       3   b2
6       3  b11
7       4   c1

I need to create a grouping variable based on problem variable. Basically, when a problem has multiple items, there will be a number assigned. When the problem has only one item, I need to assign NA. To make it clear, here is what my desired output.

> df2
    problem item assign
1 Problem 1   a1      1
2 Problem 1   a2      1
3 Problem 2   a3     NA
4 Problem 3   b1      2
5 Problem 3   b2      2
6 Problem 3  b11      2
7 Problem 4   c1     NA

Since Problem 2 has only one item, we skip that, and give the order number 2 for Problem 3.

Any thoughts on this? Thanks!

CodePudding user response:

Here is an option with data.table - Grouped by 'problem', get the row index (.I) where the number of rows (.N) is greater than 1, use that column ($V1) in the i to subset the data, and create new column 'assign' with .GRP grouped by 'problem'

library(data.table)
setDT(df1)[df1[, .I[.N > 1], problem]$V1, assign := .GRP, problem]

-output

> df1
   problem   item assign
     <num> <char>  <int>
1:       1     a1      1
2:       1     a2      1
3:       2     a3     NA
4:       3     b1      2
5:       3     b2      2
6:       3    b11      2
7:       4     c1     NA

CodePudding user response:

Here's a dplyr option:

library(dplyr)

df1 %>%
  add_count(problem, name = "n_prob") %>%
  mutate(assign = as.integer(factor(ifelse(n_prob > 1, problem, NA)))) %>%
  select(-n_prob)
# # A tibble: 7 × 3
#   problem item  assign
#     <dbl> <chr>  <int>
# 1       1 a1         1
# 2       1 a2         1
# 3       2 a3        NA
# 4       3 b1         2
# 5       3 b2         2
# 6       3 b11        2
# 7       4 c1        NA
  • Related