Home > database >  R parse a string to extract, calculate, and replace a number
R parse a string to extract, calculate, and replace a number

Time:10-21

I have a one-column matrix with variables concatenated together for input to a computer system. Some rows, but not all, have a number I need to extract, reduce by 5%, then replace in the string. I'd like to avoid a for loop - is this an lapply kind of thing?

Given this subset of data:

1989-05-04 [Maize].Sow(cultivar: B_100, population: 6, depth: 50, rowSpacing: 762)
1989-06-26 [Fertiliser].Apply(Amount: 124, Type: NO3N)
1989-10-23 [Maize].Harvest

for only the rows that have [Fertiliser] in them, I need to extract the number following "Amount:", in this case 124. Then I need to multiply it by 0.95 and replace it. The end result should be this:

1989-05-04 [Maize].Sow(cultivar: B_100, population: 6, depth: 50, rowSpacing: 762)
1989-06-26 [Fertiliser].Apply(Amount: 117.8, Type: NO3N)
1989-10-23 [Maize].Harvest

Here's a data frame:

field_ops <- data.frame(V1=c("1989-05-04 [Maize].Sow(cultivar: B_100, population: 6, depth: 50, rowSpacing: 762)","1989-06-26 [Fertiliser].Apply(Amount: 124, Type: NO3N)","1989-10-23 [Maize].Harvest"))

Thanks in advance for any ideas.

CodePudding user response:

Here is one option using tidyverse, where we can extract the value after Amount:, convert to numeric, do the calculation, convert back to character, then replace that value in the column for rows that have Fertiliser.

library(tidyverse)

field_ops %>% 
  mutate(x = as.character(as.numeric(str_extract(V1, "(?i)(?<=Amount:\\D)\\d "))*0.95),
         V1 = ifelse(str_detect(V1, "Fertiliser"), str_replace(V1,"(?i)(?<=Amount:\\D)\\d ", x), V1)) %>% 
  select(-x)

Output

    V1
1 1989-05-04 [Maize].Sow(cultivar: B_100, population: 6, depth: 50, rowSpacing: 762)
2                           1989-06-26 [Fertiliser].Apply(Amount: 117.8, Type: NO3N)
3                                                         1989-10-23 [Maize].Harvest

Or we could use just stringr:

library(stringr)

str_replace(field_ops$V1,
            "(?i)(?<=Amount:\\D)\\d ",
            as.character(as.numeric(
              str_extract(field_ops$V1, "(?i)(?<=Amount:\\D)\\d ")
            ) * 0.95))
  • Related