Home > OS >  How to create a new column based on flag on a different column R
How to create a new column based on flag on a different column R

Time:08-03

Lets say I have the following data frame:

df <- data.frame(
  id = c(1, 1, 1, 1, 2, 2, 2, 2),
  result = c(12, 33, 13, 44, 23, 44, 52, 11),
  flag = c("", "", "Y", "", "Y", "", "", "")
)

Outcome:

  id result flag
1  1     12     
2  1     33     
3  1     13    Y
4  1     44     
5  2     23    Y
6  2     44     
7  2     52     
8  2     11 

I want to get the following data frame:

  id result flag base_value percentage_change
1  1     12              13         -7.692308
2  1     33              13        153.846154
3  1     13    Y         13          0.000000
4  1     44              13        238.461538
5  2     23    Y         23          0.000000
6  2     44              23         91.304348
7  2     52              23        126.086957
8  2     11              23        -52.173913

Each unique "id" has only a base value which is flagged in the "flag" column. I want to create a new column that has the base value for each unique id to calculate the percentage change. This is how I did that:

1- first I create a data frame named df_right that filter rows where flag is equal to "Y" and then merge this new data frame with the original data frame.

df_right <- df[df$flag == "Y",]
    
df <- merge(x = df,
            y = df_right,
            by = "id",
            all.x = TRUE)

2- Then I rename the column names and drop unwanted columns

df <- df %>% rename(flag = flag.x,
                    result = result.x,
                    base_value = result.y)
df <- subset(df, select = -c(flag.y))

3- finally I calculate the new column I am looking for:

df$percentage_change <-
  (df$result - df$base_value) / df$base_value * 100

I think there must be an easier and cleaner way to do this process. I am not really happy with my solution and would like to learn new and better ways to get the same result.

CodePudding user response:

I am not quite sure if you want all base values to be 13, or if that was just an artifact of merging the values without a flag, but I think this achieves your goal

df2 <- df %>% 
  mutate(base_value=if_else(condition = flag=="Y", true=result,false =  13), 
         percentage_change=(result-base_value)/base_value*100)

If you want somtehing else for the base value swap it out for the false condition for base_value, example make base value the mean result

df2 <- df %>% 
  mutate(base_value=if_else(condition = flag=="Y", true=result,false =  mean(result)), 
         percentage_change=(result-base_value)/base_value*100)

CodePudding user response:

we can use dplyr package

df |> group_by(id) |> 
    mutate(base_value = result[which(flag == "Y")] ,
    percentage_change = (result - base_value)/base_value * 100) |>
    ungroup()
  • output
# A tibble: 8 × 5
     id result flag  base_value percentage_change
  <dbl>  <dbl> <chr>      <dbl>             <dbl>
1     1     12 ""            13             -7.69
2     1     33 ""            13            153.84  
3     1     13 "Y"           13              0   
4     1     44 ""            13            238.46  
5     2     23 "Y"           23              0   
6     2     44 ""            23             91.3 
7     2     52 ""            23            126.08  
8     2     11 ""            23            -52.17 
  • Related