Home > Software engineering >  Creating A New Calculated Category Within A Column in R
Creating A New Calculated Category Within A Column in R

Time:04-27

Suppose I have a data frame similar to this, only with 1000's of observations:

df <- data.frame(Group = c('A', 'A', 'A', 'B', 'B',
                           'B','B','C','C','C','D','D','D','D','D'),
                 Values=c('5','7','9','0','8','4','5','2','1','3','6','3','1','3','5'))

What I want to do is add a new calculated group to the data frame based on values in a group that already exists in the data frame without replacing the original group's values. For example, lets say I want to retain group D, but create a new group with all of group D's values 2.

An example of the resulting dataframe I would like is the following:

df <- data.frame(Group = c('A', 'A', 'A', 'B', 'B',
                           'B','B','C','C','C','D','D','D','D','D'
                           ,'Dadjusted','Dadjusted','Dadjusted','Dadjusted','Dadjusted'),
                 Values=c('5','7','9','0','8','4','5','2','1','3','6','3','1','3','5',
                          '8','5','3','5','7'))

I have tried using ifelse statements like the following:

   df$adjustedvalues<-ifelse(Group=='D', df$Values 2, df$Values)

but this approach results in data frames that look like the following:

df <- data.frame(Group = c('A', 'A', 'A', 'B', 'B',
                           'B','B','C','C','C','D','D','D','D','D'),
                 Values=c('5','7','9','0','8','4','5','2','1','3','6','3','1','3','5')
                 adjustedvalues=c('5','7','9','0','8','4','5','2','1','3','8','5','3','5','7')

Which is less than ideal for my purposes.

CodePudding user response:

Here is a possible base R option:

rbind(df, data.frame(Group = "Dadjusted", 
                     Values = as.integer(df$Values)[df$Group == "D"] 2))

Output

       Group Values
1          A      5
2          A      7
3          A      9
4          B      0
5          B      8
6          B      4
7          B      5
8          C      2
9          C      1
10         C      3
11         D      6
12         D      3
13         D      1
14         D      3
15         D      5
16 Dadjusted      8
17 Dadjusted      5
18 Dadjusted      3
19 Dadjusted      5
20 Dadjusted      7

CodePudding user response:

You could use bind_rows

library(tidyverse)

df %>% 
  bind_rows(df %>% 
            filter(Group == "D") %>%
            mutate(Values = as.character(as.numeric(Values)   2),
                   Group = "Dadjusted"))
#>        Group Values
#> 1          A      5
#> 2          A      7
#> 3          A      9
#> 4          B      0
#> 5          B      8
#> 6          B      4
#> 7          B      5
#> 8          C      2
#> 9          C      1
#> 10         C      3
#> 11         D      6
#> 12         D      3
#> 13         D      1
#> 14         D      3
#> 15         D      5
#> 16 Dadjusted      8
#> 17 Dadjusted      5
#> 18 Dadjusted      3
#> 19 Dadjusted      5
#> 20 Dadjusted      7

Created on 2022-04-26 by the reprex package (v2.0.1)

CodePudding user response:

Updated: Also a dplyr solution similar to @Allan Cameron ones but not as elegant:

library(dplyr)

df %>% 
  type.convert(as.is=TRUE) %>% 
  filter(Group=="D") %>% 
  mutate(Group = "Dadjusted",
         Values = Values   2) %>% 
  bind_rows(df %>% 
              type.convert(as.is = TRUE)) %>% 
  arrange(Group)
         Group Values
1          A      5
2          A      7
3          A      9
4          B      0
5          B      8
6          B      4
7          B      5
8          C      2
9          C      1
10         C      3
11         D      6
12         D      3
13         D      1
14         D      3
15         D      5
16 Dadjusted      8
17 Dadjusted      5
18 Dadjusted      3
19 Dadjusted      5
20 Dadjusted      7
  • Related