Lets say I have the following data frame:
df <- data.frame(
id = c(1, 1, 1, 1, 2, 2, 2, 2),
result = c(12, 33, 13, 44, 23, 44, 52, 11),
flag = c("", "", "Y", "", "Y", "", "", "")
)
Outcome:
id result flag
1 1 12
2 1 33
3 1 13 Y
4 1 44
5 2 23 Y
6 2 44
7 2 52
8 2 11
I want to get the following data frame:
id result flag base_value percentage_change
1 1 12 13 -7.692308
2 1 33 13 153.846154
3 1 13 Y 13 0.000000
4 1 44 13 238.461538
5 2 23 Y 23 0.000000
6 2 44 23 91.304348
7 2 52 23 126.086957
8 2 11 23 -52.173913
Each unique "id" has only a base value which is flagged in the "flag" column. I want to create a new column that has the base value for each unique id to calculate the percentage change. This is how I did that:
1- first I create a data frame named df_right that filter rows where flag is equal to "Y" and then merge this new data frame with the original data frame.
df_right <- df[df$flag == "Y",]
df <- merge(x = df,
y = df_right,
by = "id",
all.x = TRUE)
2- Then I rename the column names and drop unwanted columns
df <- df %>% rename(flag = flag.x,
result = result.x,
base_value = result.y)
df <- subset(df, select = -c(flag.y))
3- finally I calculate the new column I am looking for:
df$percentage_change <-
(df$result - df$base_value) / df$base_value * 100
I think there must be an easier and cleaner way to do this process. I am not really happy with my solution and would like to learn new and better ways to get the same result.
CodePudding user response:
I am not quite sure if you want all base values to be 13, or if that was just an artifact of merging the values without a flag, but I think this achieves your goal
df2 <- df %>%
mutate(base_value=if_else(condition = flag=="Y", true=result,false = 13),
percentage_change=(result-base_value)/base_value*100)
If you want somtehing else for the base value swap it out for the false condition for base_value, example make base value the mean result
df2 <- df %>%
mutate(base_value=if_else(condition = flag=="Y", true=result,false = mean(result)),
percentage_change=(result-base_value)/base_value*100)
CodePudding user response:
we can use dplyr
package
df |> group_by(id) |>
mutate(base_value = result[which(flag == "Y")] ,
percentage_change = (result - base_value)/base_value * 100) |>
ungroup()
- output
# A tibble: 8 × 5
id result flag base_value percentage_change
<dbl> <dbl> <chr> <dbl> <dbl>
1 1 12 "" 13 -7.69
2 1 33 "" 13 153.84
3 1 13 "Y" 13 0
4 1 44 "" 13 238.46
5 2 23 "Y" 23 0
6 2 44 "" 23 91.3
7 2 52 "" 23 126.08
8 2 11 "" 23 -52.17