I have a data frame like this
df <- data.frame(Income = c("$100to$200","under$100","above$1000"))
I would like this as output
df_final <- data.frame(Avg = c(150,100,1000))
I would like to extract the numeric value from the income column, if there are two numbers, take the average, if there is only one number, take that number.
CodePudding user response:
A few key steps here. First we need to clean our data, in this case getting rid of the $
makes thing easier. Then we'll split into a From and To column. Finally we need to convert to numeric and calculate the row means.
library(tidyverse)
df %>%
mutate(Income = gsub("$", "", Income, fixed = TRUE)) %>%
separate(Income, "to|under|above", into = c("From", "To")) %>%
mutate_all(.,as.numeric) %>%
mutate(Avg = rowMeans(.,na.rm =TRUE))
From To Avg
1 100 200 150
2 NA 100 100
3 NA 1000 1000
CodePudding user response:
You could try:
library(dplyr)
library(stringr)
df %>%
mutate(across(Income, ~ sapply(str_extract_all(.x, '\\d '), \(x) {strsplit(x, ',') |>
as.numeric() |> mean()})))
Income
1 150
2 100
3 1000
CodePudding user response:
df%>%transmute(Avg=stringr::str_extract_all(Income,"(?<=\\$)\\d ")%>%lapply(as.numeric)%>%sapply(mean))
Avg
1 150
2 100
3 1000
CodePudding user response:
A stringr
approach using gsub
to get the numerics, str_squish
to remove the white space and str_split
to get the entries in case of more then one value.
library(stringr)
data.frame(Avg = sapply(
str_split(str_squish(gsub("[[:alpha:]$]", " ", df$Income)), " "), function(x)
sum(as.numeric(x)) / length(x)))
Avg
1 150
2 100
3 1000