Home > Software engineering >  create new variable using part of another variable string
create new variable using part of another variable string

Time:11-24

I am working with a data frame with more than 1000 rows and I want to create a new variable based on part of another variable string.

This is short version of the data but I want to extract the numbers from the 'id" variable and create the "height" variable. The data frame should look like something like this:

df<-data.frame(id=c("Necrosis_Char_cat_0.05m","Necrosis_Char_cat_0.1m",
                "Necrosis_Char_cat_1.7m"), 
           height=c(0.05, 0.1, 1.7))

I tried to use this code:

 df_new <- df%>% 
   mutate(height = as.numeric(str_replace(.id, ".*(\\d)(\\d )m.*", "\\1.\\2")))

But I get the following Warning message:

In eval(cols[[col]], .data, parent.frame()) : NAs introduced by coercion

In addition to the NAs, some of the values like 0.05 shows as 0.5. I believe the issue might be the way I am writing the pattern and/or replacement in str_replace(). Any help with that is very much appreciated. Thank you.

CodePudding user response:

There are probably a bunch of ways to do this, but here are a few:

library(tidyverse)


df<-data.frame(id=c("Necrosis_Char_cat_0.05m","Necrosis_Char_cat_0.1m",
                "Necrosis_Char_cat_1.7m"), 
           height=c(0.05, 0.1, 1.7))

#option1
df |>
  extract(id, 
          into = "new_height", 
          regex = ".*_(\\d \\.\\d )m",
          remove = FALSE,
          convert = TRUE)
#>                        id new_height height
#> 1 Necrosis_Char_cat_0.05m       0.05   0.05
#> 2  Necrosis_Char_cat_0.1m       0.10   0.10
#> 3  Necrosis_Char_cat_1.7m       1.70   1.70

#option 2
df |>
  mutate(new_height = as.numeric(sub(".*_(\\d \\.\\d )m", "\\1", id)))
#>                        id height new_height
#> 1 Necrosis_Char_cat_0.05m   0.05       0.05
#> 2  Necrosis_Char_cat_0.1m   0.10       0.10
#> 3  Necrosis_Char_cat_1.7m   1.70       1.70

#option 3
df |>
  mutate(new_height = as.numeric(str_extract(id, "\\d.*(?=m)")))
#>                        id height new_height
#> 1 Necrosis_Char_cat_0.05m   0.05       0.05
#> 2  Necrosis_Char_cat_0.1m   0.10       0.10
#> 3  Necrosis_Char_cat_1.7m   1.70       1.70

CodePudding user response:

library(dplyr)
library(readr)
df %>% 
  mutate(height2 = parse_number(id))

CodePudding user response:

df %>%
   mutate(new_height = parse_number(id))
                       id height new_height
1 Necrosis_Char_cat_0.05m   0.05       0.05
2  Necrosis_Char_cat_0.1m   0.10       0.10
3  Necrosis_Char_cat_1.7m   1.70       1.70
  •  Tags:  
  • r
  • Related