Home > Back-end >  Changing Duplicate Values Within Subjects: R
Changing Duplicate Values Within Subjects: R

Time:03-19

My data looks like this:

Country GDP Year
A 10 1972
A 15 1973
A 20 1973
A 18 1975
B 25 1950
B 30 1951
B 35 1951
B 36 1953

I have so many observations look like data that I presented above. I want to change the duplicated years. However, I want to change first duplicated row of the year variable. I want to see my data like this:

Country GDP Year
A 10 1972
A 20 1973
A 15 1974
A 18 1975
B 25 1950
B 35 1951
B 30 1952
B 36 1953

Thank you for your time!

CodePudding user response:

How about this ?

library(dplyr)

df %>%
  arrange(Country, Year) %>%
  group_by(Country) %>%
  mutate(Year = min(Year)   row_number() - 1) %>%
  ungroup

#  Country   GDP  Year
#  <chr>   <int> <dbl>
#1 A          10  1972
#2 A          15  1973
#3 A          20  1974
#4 A          18  1975
#5 B          25  1950
#6 B          30  1951
#7 B          35  1952
#8 B          36  1953

This increments every Year by 1 starting from minimum value in each Country.

CodePudding user response:

Here is one possible option with tidyverse:

library(tidyverse)

df %>% 
  group_by(Country, Year) %>%
  mutate(dup = case_when(n() == 1 ~ FALSE,
                         min(GDP) == GDP ~ TRUE,
                         TRUE ~ FALSE)) %>% 
  mutate(Year = ifelse(dup == TRUE, Year   1, Year)) %>% 
  arrange(Country, Year) %>% 
  ungroup %>% 
  select(-dup)

Output

  Country   GDP  Year
  <chr>   <int> <dbl>
1 A          10  1972
2 A          20  1973
3 A          15  1974
4 A          18  1975
5 B          25  1950
6 B          35  1951
7 B          30  1952
8 B          36  1953
  • Related