My data looks like this:
Country | GDP | Year |
---|---|---|
A | 10 | 1972 |
A | 15 | 1973 |
A | 20 | 1973 |
A | 18 | 1975 |
B | 25 | 1950 |
B | 30 | 1951 |
B | 35 | 1951 |
B | 36 | 1953 |
I have so many observations look like data that I presented above. I want to change the duplicated years. However, I want to change first duplicated row of the year variable. I want to see my data like this:
Country | GDP | Year |
---|---|---|
A | 10 | 1972 |
A | 20 | 1973 |
A | 15 | 1974 |
A | 18 | 1975 |
B | 25 | 1950 |
B | 35 | 1951 |
B | 30 | 1952 |
B | 36 | 1953 |
Thank you for your time!
CodePudding user response:
How about this ?
library(dplyr)
df %>%
arrange(Country, Year) %>%
group_by(Country) %>%
mutate(Year = min(Year) row_number() - 1) %>%
ungroup
# Country GDP Year
# <chr> <int> <dbl>
#1 A 10 1972
#2 A 15 1973
#3 A 20 1974
#4 A 18 1975
#5 B 25 1950
#6 B 30 1951
#7 B 35 1952
#8 B 36 1953
This increments every Year
by 1 starting from minimum value in each Country
.
CodePudding user response:
Here is one possible option with tidyverse
:
library(tidyverse)
df %>%
group_by(Country, Year) %>%
mutate(dup = case_when(n() == 1 ~ FALSE,
min(GDP) == GDP ~ TRUE,
TRUE ~ FALSE)) %>%
mutate(Year = ifelse(dup == TRUE, Year 1, Year)) %>%
arrange(Country, Year) %>%
ungroup %>%
select(-dup)
Output
Country GDP Year
<chr> <int> <dbl>
1 A 10 1972
2 A 20 1973
3 A 15 1974
4 A 18 1975
5 B 25 1950
6 B 35 1951
7 B 30 1952
8 B 36 1953