I have a dataset containing all character values, I want to change some to factor and some to numeric depending on what is contained in the original character value (if a number is contained then convert column to numeric, if a letter is contained, factor). I have this for loop where I am sequencing along my dataset but I can't get i to return the actual cell value.
l <- c("1", "2", "3", "4", "5", "6", "7", "8", "9", "0")
for (i in df) {
if(i[c(1)] %in% l) {
as.numeric(i)
} else {
as.factor(i)
}
}
I also have tried with grepl and ifelse:
for (i in seq_along(df[c(0),])) {
ifelse((grepl(l, df[i])), as.numeric(i), as.factor(i))
}
this is a reproducible example of the dataset:
col1 | col2 | col3 |
---|---|---|
true | 1 | -25.4 |
false | 2 | 123.23 |
false | 3 | 321 |
true | 4 | -24 |
--for this example I would want col1 to be a factor and col2, col3 to be numeric
CodePudding user response:
in base R just do:
df <- type.convert(df, as.is = FALSE)
str(df)
'data.frame': 4 obs. of 3 variables:
$ col1: Factor w/ 2 levels "false","true": 2 1 1 2
$ col2: int 1 2 3 4
$ col3: num -25.4 123.2 321 -24
CodePudding user response:
This solution uses a function that attempts to convert to number and sum: if the result is NA, then it converts to factor, otherwise converts to number.
Then use lapply()
and data.frame()
to apply the function to each column of df
and gather the results into a data frame.
convert <- function (x) {
if(is.na(sum(as.numeric(x[!is.na(x)])))) {
as.factor(x)
} else {
as.numeric(x)
}
}
df2<-data.frame(lapply(df, convert))
CodePudding user response:
Doable pretty nicely with dplyr::across
:
library(dplyr)
mtcars %>%
as_tibble() %>% # to make printing show column class type
mutate(across(1:4, as.character),
across(5:7, as.factor))
Result:
# A tibble: 32 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<chr> <chr> <chr> <chr> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.46 0 1 4 4
2 21 6 160 110 3.9 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
8 24.4 4 146.7 62 3.69 3.19 20 1 0 4 2
9 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
10 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
# … with 22 more rows
CodePudding user response:
Try the following:
x %>% mutate(across(starts_with('col1'), as.factor), across(contains(c('col1','col2')), as.numeric))
Of course if your col1
is named something more unique than to all the other columns then you can try
x %>% mutate(across(starts_with('col1'), as.factor), across(!contains('col1'), as.numeric))