iterate over a row in a dataframe to change class type-CodePudding

I have a dataset containing all character values, I want to change some to factor and some to numeric depending on what is contained in the original character value (if a number is contained then convert column to numeric, if a letter is contained, factor). I have this for loop where I am sequencing along my dataset but I can't get i to return the actual cell value.

l <- c("1", "2", "3", "4", "5", "6", "7", "8", "9", "0")

   
for (i in df) {
  if(i[c(1)] %in% l) {
    as.numeric(i)
  } else {
    as.factor(i)
  }
}

I also have tried with grepl and ifelse:

for (i in seq_along(df[c(0),])) {
  ifelse((grepl(l, df[i])), as.numeric(i), as.factor(i))
}

this is a reproducible example of the dataset:

col1	col2	col3
true	1	-25.4
false	2	123.23
false	3	321
true	4	-24

--for this example I would want col1 to be a factor and col2, col3 to be numeric

CodePudding user response：

in base R just do:

df <- type.convert(df, as.is = FALSE)

str(df)

'data.frame':   4 obs. of  3 variables:
 $ col1: Factor w/ 2 levels "false","true": 2 1 1 2
 $ col2: int  1 2 3 4
 $ col3: num  -25.4 123.2 321 -24

CodePudding user response：

This solution uses a function that attempts to convert to number and sum: if the result is NA, then it converts to factor, otherwise converts to number.

Then use lapply() and data.frame() to apply the function to each column of df and gather the results into a data frame.

convert <- function (x) {                                             
        if(is.na(sum(as.numeric(x[!is.na(x)])))) {      
                as.factor(x)                            
        } else {                                        
                as.numeric(x)                           
        }                                               
}                                                       

df2<-data.frame(lapply(df, convert))

CodePudding user response：

Doable pretty nicely with dplyr::across:

library(dplyr)
mtcars %>%
  as_tibble() %>% # to make printing show column class type
  mutate(across(1:4, as.character),
         across(5:7, as.factor))

Result:

# A tibble: 32 × 11
   mpg   cyl   disp  hp    drat  wt    qsec     vs    am  gear  carb
   <chr> <chr> <chr> <chr> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
 1 21    6     160   110   3.9   2.62  16.46     0     1     4     4
 2 21    6     160   110   3.9   2.875 17.02     0     1     4     4
 3 22.8  4     108   93    3.85  2.32  18.61     1     1     4     1
 4 21.4  6     258   110   3.08  3.215 19.44     1     0     3     1
 5 18.7  8     360   175   3.15  3.44  17.02     0     0     3     2
 6 18.1  6     225   105   2.76  3.46  20.22     1     0     3     1
 7 14.3  8     360   245   3.21  3.57  15.84     0     0     3     4
 8 24.4  4     146.7 62    3.69  3.19  20        1     0     4     2
 9 22.8  4     140.8 95    3.92  3.15  22.9      1     0     4     2
10 19.2  6     167.6 123   3.92  3.44  18.3      1     0     4     4
# … with 22 more rows

CodePudding user response：

Try the following:

x %>% mutate(across(starts_with('col1'), as.factor), across(contains(c('col1','col2')), as.numeric))

Of course if your col1 is named something more unique than to all the other columns then you can try

x %>% mutate(across(starts_with('col1'), as.factor), across(!contains('col1'), as.numeric))