I have the following dataframe. All the columns are character columns, but except the 1st column, rest of the columns contain numeric values:
df <- data.frame(col1=c('', 'assets', 'loss', 'liability'),
col2=c("", "5,000", "(1,400)", "300"),
col3=c("", "4,500", "(1,100)", "500"))
df
col1 col2 col3
<chr> <chr> <chr>
assets 5,000 4,500
loss (1,400) (1,100)
liability 300 500
The task is -- if a value is within ()
[for example, loss
value], I need to make that value as a negative number.
I am trying the following script:
clean_columns <- function(dataframe){
dataframe_new <- copy(dataframe)
for (idx in 2:length(names(dataframe))) {
# Check if text contains -- ( )
if (grepl(is.na(dataframe_new[, idx]), "(", fixed = TRUE)) {
# Remove characters -- () , -- and multiply -1
dataframe_new[, idx] <- gsub("[(,)]", "", dataframe_new[, idx]) %>%
as.numeric(dataframe_new[, idx])*(-1)
} else
# Remove characters -- ,
dataframe_new[, idx] <- gsub("[,]", "", dataframe_new[, idx]) %>%
as.numeric(dataframe_new[, idx])
}
return(dataframe_new)
}
The function removes the characters correctly, but make the loss
value NA.
clean_columns(df)
col1 col2 col3
<chr> <dbl> <dbl>
NA NA
assets 5000 4500
loss NA NA
liability 300 500
Desired Output
col1 col2 col3
<chr> <dbl> <dbl>
NA NA
assets 5000 4500
loss -1400 -1100
liability 300 500
Any suggestions would be appreciated. Thanks!
CodePudding user response:
We loop across
the columns having at least some digits, remove the )
, replace the (
with -
and parse it with parse_number
library(dplyr)
library(stringr)
df %>%
mutate(across(-1,
~ readr::parse_number(str_replace(str_remove(.x, "\\)"), fixed("("), "-") )))
-output
col1 col2 col3
1 NA NA
2 assets 5000 4500
3 loss -1400 -1100
4 liability 300 500
Or instead of removing the (
, we can detect the (
to multiply by -1
df %>%
mutate(across(-1, ~ c(1, -1)[1 str_detect(.x, fixed("("))] *
readr::parse_number(.x) ))
CodePudding user response:
Here is one without my favorite function parse_number()
:
library(dplyr)
library(stringr)
df %>%
mutate(across(-col1, ~str_replace_all(., "[[:punct:]]", ""))) %>%
type.convert(as.is = TRUE) %>%
mutate(across(-col1, ~ifelse(col1 == "loss", .*-1, .))) %>%
as_tibble()
col1 col2 col3
<chr> <dbl> <dbl>
1 "" NA NA
2 "assets" 5000 4500
3 "loss" -1400 -1100
4 "liability" 300 500