Home > front end >  Partial grouping inside a dataframe in R
Partial grouping inside a dataframe in R

Time:04-20

For statistical analysis purpose, I would like to regroup some rows inside a data frame based on their values.

What I have:

number latitude
30 57
12 59
01 68
12 66
101 55
47 61
05 60
288 67

The desired output would be, for example, to regroup every latitude above 66 (66 67 68) in a single category 66 and the desired output would be like this:

number latitude new
30 57 57
12 59 59
01 68 66
12 66 66
101 55 55
47 61 61
05 60 60
288 67 66

I do not want to use an if loop because I feel that it is not really R friendly. I would also like to keep the initial column, that way I can try different combinations later on.

Thank you very much.

CodePudding user response:

Option mutate and ifelse:

library(dplyr)
df %>%
  mutate(new = ifelse(latitude >= 66, "66 ", latitude))

Output:

  number latitude new
1     30       57  57
2     12       59  59
3     01       68 66 
4     12       66 66 
5    101       55  55
6     47       61  61
7     05       60  60
8    288       67 66 

Data

df <- data.frame(number = c("30","12","01","12","101","47","05","288"),
                 latitude = c(57,59,68,66,55,61,60,67))

CodePudding user response:

library(tidyverse)

tribble(~"number",  ~"latitude",
        30, 57,
        12, 59,
        01, 68,
        12, 66,
        101,55,
        47, 61,
        05, 60,
        288,67) %>% 
  dplyr::mutate(
    new = if_else(latitude > 66,
                  "66 ",
                  as.character(latitude)))

CodePudding user response:

We can use

df1$new <- df1$latitude
df1$new[df1$latitude >=66] <- "66 "

or with ifelse

df1$new <- with(df1, ifelse(latitude >=66, "66 ", latitude))

-output

> df1
  number latitude new
1     30       57  57
2     12       59  59
3      1       68 66 
4     12       66 66 
5    101       55  55
6     47       61  61
7      5       60  60
8    288       67 66 

Also, as @Mael commented about the type of 'new' column, if we want to preserve the type, can also use pmin

library(dplyr)
df1 %>%
    mutate(new = pmin(latitude, 66))
   number latitude new
1     30       57  57
2     12       59  59
3      1       68  66
4     12       66  66
5    101       55  55
6     47       61  61
7      5       60  60
8    288       67  66
  • Related