Home > Blockchain >  R: Add new column based on character vector and existing column in dataframe with unique items
R: Add new column based on character vector and existing column in dataframe with unique items

Time:12-08

I want to assign elements in the character vector to the dataframe based on matching information in the existing column.

Data frame with one column

head(df, 5)

       items
1        1
2        1
3        1
4        1
5        1

tail(df, 5)

120001  44
120002  44
120003  44
120004  44
120005  44

The character vector chr_v consists of 44 unique items.

chr_v <- c("T1_1", "C1_1", "T1_2", "A_1", "C_2", "C_3", "T1_3", "A_2", "C_4", 
"C_5", "C_6", "C_7", "C_8", "A_3", "C_9", 'C_10', "C_11", "A_4", 'C_12', "A_5", 
"C_13", "A_6", "A_7", "C_14", "C_15", "C_16", "T_4", "C_17", "C_18", "C_19", 'T_5', 
"C_20", "C_21", "T_6", "A_8", "C_22", "C_23", "C_24", "C_25", "C_26", "T_7", "T_8", 
'C_27', 'C_28')

The length of ```chr_v``` is 
length(chr_v)
[1] 44

There are 44 unique ordered items in column items in dataframe and 44 rows in the character vector. I want to create a new column by repeating each item in the character vector to the unique ordered item in the column of dataframe.

Expected Output: head(df, 5)

       items    newitem
1        1      T1_1
2        1      T1_1
3        1      T1_1
4        1      T1_1
5        1      T1_1

tail(df, 5)

      items    newitem
120001  44      C_28
120002  44      C_28
120003  44      C_28
120004  44      C_28
120005  44      C_28

I checked the dimension of each items in the df with table command but the output is not ordered (even tried to sort). Therefore, I cannot use the output to simply repeat the items sequentially.

CodePudding user response:

Martin provided a tidyverse solution. Here is a base R solution:

df$newitem <- sample_info[df$items]

Here the dplyr pendant:

df %>% 
  mutate(newitem = sample_info[items])

output:

   items newitem
1      1    T1_1
2      1    T1_1
3      1    T1_1
4      1    T1_1
5      1    T1_1
6     44    C_28
7     44    C_28
8     44    C_28
9     44    C_28
10    44    C_28

data:

df <- structure(list(items = c(1L, 1L, 1L, 1L, 1L, 44L, 44L, 44L, 44L, 
44L), newitem = c("T1_1", "T1_1", "T1_1", "T1_1", "T1_1", "C_28", 
"C_28", "C_28", "C_28", "C_28")), row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10"), class = "data.frame")

CodePudding user response:

You could use enframe() combined with a left_join():

library(tidyverse)

df %>% 
  left_join(enframe(chr_v), by = c("items" = "name"))
  • Related