I have this tibble:
tibble(country = c("USA","USA","USA","USA","CHINA","CHINA","CHINA","CHINA","SPAIN","SPAIN","SPAIN","SPAIN"),
values = c(10,20,100,30,60,70,80,90,100,20,100,90))
how can i make the following column - the condition is population column starts at value 1 and increments by 1, when the values == 100, it resets to the value 1 and increment starts again - see last example SPAIN where the change happens twice.
So final tibble is:
tibble(country = c("USA","USA","USA","USA","CHINA","CHINA","CHINA","CHINA","SPAIN","SPAIN","SPAIN","SPAIN"),
values = c(10,20,100,30,60,70,80,90,100,20,100,90),
population = c(1,2,1,2,1,2,3,4,1,2,1,2))
Any help given thanks
CodePudding user response:
We could use rowid
on the 'country' and the cumulative sum of logical vector
library(dplyr)
library(data.table)
df1 %>%
mutate(population = rowid(country, cumsum(values == 100)))
-output
# A tibble: 12 × 3
country values population
<chr> <dbl> <int>
1 USA 10 1
2 USA 20 2
3 USA 100 1
4 USA 30 2
5 CHINA 60 1
6 CHINA 70 2
7 CHINA 80 3
8 CHINA 90 4
9 SPAIN 100 1
10 SPAIN 20 2
11 SPAIN 100 1
12 SPAIN 90 2
CodePudding user response:
Slightly different approach from akrun but same logic:
library(dplyr)
df %>%
group_by(country, id_Group =cumsum(values == 100) 1) %>%
mutate(population = row_number(), .keep="used")
country id_Group population
<chr> <dbl> <int>
1 USA 1 1
2 USA 1 2
3 USA 2 1
4 USA 2 2
5 CHINA 2 1
6 CHINA 2 2
7 CHINA 2 3
8 CHINA 2 4
9 SPAIN 3 1
10 SPAIN 3 2
11 SPAIN 4 1
12 SPAIN 4 2