Home > Blockchain >  R create new column that creates a label or category, until it reaches a new label
R create new column that creates a label or category, until it reaches a new label

Time:05-19

Here is my dataframe

df2 <- structure(list(Code = c("ICB-9_label_1", "1", "2", "3", 
"4", "5", "1", "ICB-10_label_2", "3", "4", "5", 
"1", "2", "3", "3", "5", "1", "2", 
"3", "4", "5", "1", "2", "3", "4", 
"5", "1", "2", "3", "4", "5", "1", 
"2", "3", "4", "5", "1", "2", "3", 
"4", "5", "1", "2", "3", "4", "5", 
"1"), Description = c("description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here", "description here", "description here", "description here", 
"description here")), row.names = c(NA, -47L), class = c("tbl_df", 
"tbl", "data.frame"))

And here is what it looks like as a table:

 Code             Description
ICB-9_label_1     description here          
1                 description here          
2                 description here          
3                 description here          
4                 description here          
5                 description here          
1                 description here          
ICB-10_label_2    description here          
3                 description here          
4                 description here

I'd like to create a third column called "Labels". It would say "ICB_9_label_1" all the way down until it hits the row number for "ICB_10_label_2", then the column would say "ICB_10_label_2" all the way down. I don't want to override the numbers in the first column, since the 1, 2, 3, 4, 5 values are important.

CodePudding user response:

There are multiple ways to do this. An option is to extract the row that have 'label', while others returns NA and then use fill to change the NA elements to previous non-NA value

library(dplyr)
library(tidyr)
library(stringr)
df2 <- df2 %>% 
  mutate(Labels = str_extract(Code, '.*label.*')) %>% 
  fill(Labels, .direction = 'downup') 

-output

df2
# A tibble: 47 × 3
   Code           Description      Labels        
   <chr>          <chr>            <chr>         
 1 ICB-9_label_1  description here ICB-9_label_1 
 2 1              description here ICB-9_label_1 
 3 2              description here ICB-9_label_1 
 4 3              description here ICB-9_label_1 
 5 4              description here ICB-9_label_1 
 6 5              description here ICB-9_label_1 
 7 1              description here ICB-9_label_1 
 8 ICB-10_label_2 description here ICB-10_label_2
 9 3              description here ICB-10_label_2
10 4              description here ICB-10_label_2
# … with 37 more rows

Or using base R with grep and cumsum

transform(df2, Labels = grep('label', Code, 
       value = TRUE)[cumsum(grepl('label', Code))])
  •  Tags:  
  • r
  • Related