I am looking to create a new column based off of factors present in another column. For example I have the following data frame:
> df <- data.frame(ID = c("1","1","2","2"), Month = c("01", "02", "01", "02"), Percentile = c("P50", "P95", "P97", "P85"))
> df
ID Month Percentile
1 1 01 P50
2 1 02 P95
3 2 01 P97
4 2 02 P85
I want to create a new column, PercentileChange to say that the for an individual ID, look at the first months percentile and compare it to the next months percentile and see if it increased or decreased and by how many levels so I would get an output like this:
> df2
ID Month Percentile PercentileChange
1 1 01 P50 NA
2 1 02 P95 4
3 2 01 P97 NA
4 2 02 P85 -3
These are the levels present in the Percentile column
df$Percentile <- factor(df$Percentile,levels=c("P01","P1","P3","P5","P10","P15","P25","P50","P75","P85","P90","P95","P97","P99","P999"))
Edit: Adding longer df example:
> df <- data.frame(ID = c("1","1","1","1","2","2","3","3","3"), Month = c("01", "02", "03", "04", "01", "02", "02", "03", "05"), Percentile = c("P50", "P95", "P97", "P85", "P01","P01", "P5","P5","P3"))
> df
ID Month Percentile
1 1 01 P50
2 1 02 P95
3 1 03 P97
4 1 04 P85
5 2 01 P01
6 2 02 P01
7 3 02 P5
8 3 03 P5
9 3 05 P3
Output:
ID Month Percentile PercentileChange
1 1 01 P50 NA
2 1 02 P95 4
3 1 03 P97 1
4 1 04 P85 3
5 2 01 P01 NA
6 2 02 P01 0
7 3 02 P5 NA
8 3 03 P5 0
9 3 05 P3 -1
CodePudding user response:
You can group_by
ID
and get the distance between the indexes of the factor levels in the Percentile
column:
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(PercentileChange = levels(Percentile) %>%
{match(Percentile, .) - match(lag(Percentile), .)})
Further explanation:
levels(Percentile) %>%
{match(Percentile, .) - match(lag(Percentile), .)}
Is equivalent to
match(Percentile, levels(Percentile)) - match(lag(Percentile), levels(Percentile))
And, match(Percentile, levels(Percentile))
will yied in which position on the levels(Percentile)
vector the values on Percentile
lie.