consider the following code
x <- c('2','75% (3/4)','80% (4/5)','70% (7/10)','90% (9/10)')
y <- c('1', '50% (1/2)', '25% (1/4)', '30% (3/10)', '40% (2/5)')
df <- data.frame(rbind(x, y))
I would like to extract the values before the % sign i.e. the whole numbers.
I understand that I can do this using the following:
df$X2 <- sub("%.*", "", df$X2)
But to avoid copy and pasting, and going through each column, is there a way to do it in one step?
I have tried to do the following:
df[-1] <- sub("%.*", "", df[-1])
But this leaves the format as 'c("75' which is not what I am after - what has gone wrong here? Is there another suitable way to do this?
Thanks
CodePudding user response:
The easiest way would likely be to do this using dplyr:
library(dplyr)
mutate(df, across(everything(), stringr::str_remove, "%.*"))
X1 X2 X3 X4 X5
x 2 75 80 70 90
y 1 50 25 30 40
CodePudding user response:
Base R:
df[] <- lapply(df, sub, pattern = "%.*", replacement = "")
df
# X1 X2 X3 X4 X5
# x 2 75 80 70 90
# y 1 50 25 30 40
The df[] <-
is necessary because by default, lapply
returns a list
(not a data.frame
). By using df[]
on the LHS of the assignment, the contents of the columns are replaced within the structure of the frame. This also works well when operating on a subset of columns, as in
df[c(2,3,5)] <- lapply(df[c(2,3,5)], sub, pattern = "%.*", replacement = "")
which is admittedly not what you want here, but provides a way to customize which columns are affected.
The lapply(df, sub, ...)
is identical to the use of an anonymous function:
lapply(df, function(z) sub("%.*", "", z))
Because the elements of the argument (df
here) are passed a the first argument to the function (which would be pattern=
), we explicitly pass the constant values to those as supplement arguments to lapply
, where anything after the first two arguments (X
, our df
; and FUN
) are provided as unchanging arguments to the function.
CodePudding user response:
Maybe this might be the output you were looking for?
for (i in colnames(df)){
df[,i] <- sub("%.*", "", df[,i])
}
print(df)
X1 X2 X3 X4 X5
x 2 75 80 70 90
y 1 50 25 30 40