Extract multiple numeric value from a data frame column and store in separate columns-CodePudding

I have a data frame like this

df <- data.frame(Income = c("$100to$200","under$100","above$1000"))

I would like this as output

df_final <- data.frame(Avg = c(150,100,1000))

I would like to extract the numeric value from the income column, if there are two numbers, take the average, if there is only one number, take that number.

CodePudding user response：

A few key steps here. First we need to clean our data, in this case getting rid of the $ makes thing easier. Then we'll split into a From and To column. Finally we need to convert to numeric and calculate the row means.

library(tidyverse)

df %>% 
  mutate(Income = gsub("$", "", Income, fixed = TRUE)) %>% 
  separate(Income, "to|under|above", into = c("From", "To")) %>% 
  mutate_all(.,as.numeric) %>% 
  mutate(Avg = rowMeans(.,na.rm =TRUE))

  From   To  Avg
1  100  200  150
2   NA  100  100
3   NA 1000 1000

CodePudding user response：

You could try:

library(dplyr)
library(stringr)

df %>% 
  mutate(across(Income, ~ sapply(str_extract_all(.x, '\\d '), \(x) {strsplit(x, ',') |>
      as.numeric() |> mean()})))

  Income
1    150
2    100
3   1000

CodePudding user response：

df%>%transmute(Avg=stringr::str_extract_all(Income,"(?<=\\$)\\d ")%>%lapply(as.numeric)%>%sapply(mean))

   Avg
1  150
2  100
3 1000

CodePudding user response：

A stringr approach using gsub to get the numerics, str_squish to remove the white space and str_split to get the entries in case of more then one value.

library(stringr)

data.frame(Avg = sapply(
  str_split(str_squish(gsub("[[:alpha:]$]", " ", df$Income)), " "), function(x) 
    sum(as.numeric(x)) / length(x)))
   Avg
1  150
2  100
3 1000