Home > Mobile >  How to identify changes name in dplyr() groups?
How to identify changes name in dplyr() groups?

Time:02-22

I am trying to figure out how to identify name changes within a group.

For example, I have a dataframe that looks like this:

df <- data.frame(
  state = rep(c("CA", "WI", "NY"), each = 2),
  year = rep(c(2000, 2001), each = 9),
  name = c("John", "Paul", "Sally",
           "Mary", "Fred", "Jane",
           "Linda", "Carl", "Jim",
           "Peter", "Paul", "Sally",
           "Mary", "Kate", "Jane",
           "Linda", "Carl", "Jim")
)

> df
   state year  name
1     CA 2000  John
2     CA 2000  Paul
3     WI 2000 Sally
4     WI 2000  Mary
5     NY 2000  Fred
6     NY 2000  Jane
7     CA 2000 Linda
8     CA 2000  Carl
9     WI 2000   Jim
10    WI 2001 Peter
11    NY 2001  Paul
12    NY 2001 Sally
13    CA 2001  Mary
14    CA 2001  Kate
15    WI 2001  Jane
16    WI 2001 Linda
17    NY 2001  Carl
18    NY 2001   Jim

As you can see, "Peter" replaced "John" in 2001, and "Kate" replaced "Fred" in 2001.

So I want the output to look like:

df <- data.frame(
  state = rep(c("CA", "WI", "NY"), each = 2),
  year = rep(c(2000, 2001), each = 9),
  name = c("John", "Paul", "Sally",
           "Mary", "Fred", "Jane",
           "Linda", "Carl", "Jim",
           "Peter", "Paul", "Sally",
           "Mary", "Kate", "Jane",
           "Linda", "Carl", "Jim"),
  change = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
             1, 0, 0, 0, 1, 0, 0, 0, 0)
)

df
   state year  name change
1     CA 2000  John     NA
2     CA 2000  Paul     NA
3     WI 2000 Sally     NA
4     WI 2000  Mary     NA
5     NY 2000  Fred     NA
6     NY 2000  Jane     NA
7     CA 2000 Linda     NA
8     CA 2000  Carl     NA
9     WI 2000   Jim     NA
10    WI 2001 Peter      1
11    NY 2001  Paul      0
12    NY 2001 Sally      0
13    CA 2001  Mary      0
14    CA 2001  Kate      1
15    WI 2001  Jane      0
16    WI 2001 Linda      0
17    NY 2001  Carl      0
18    NY 2001   Jim      0

As you can see, Peter in 2001 and Kate in 2001 are both marked as "1" in the "change" column because they replaced "John" and "Fred" in 2000-CA and 2000-NY, respectively.

I've been looking at using some lag methods, but it seems to just look at the previous row, not by state, year groups:

df2 <- df %>% 
  group_by(state, year) %>%
  mutate(change = lag(name, order_by = year))

Any help would be appreciated!

CodePudding user response:

Based on the expected output, maybe this helps - create a logical column based on the duplicated 'name' in the entire data, then grouped by 'year', if all values are FALSE (!change), then replace with NA or else negate (!) and convert the logical to binary ( )

library(dplyr)
df %>%
  mutate(change = duplicated(name)) %>%
  group_by(year) %>%
   mutate(
   change = if(all(!change)) NA_integer_ else  (!change)) %>% 
  ungroup

-output

# A tibble: 18 × 4
   state  year name  change
   <chr> <dbl> <chr>  <int>
 1 CA     2000 John      NA
 2 CA     2000 Paul      NA
 3 WI     2000 Sally     NA
 4 WI     2000 Mary      NA
 5 NY     2000 Fred      NA
 6 NY     2000 Jane      NA
 7 CA     2000 Linda     NA
 8 CA     2000 Carl      NA
 9 WI     2000 Jim       NA
10 WI     2001 Peter      1
11 NY     2001 Paul       0
12 NY     2001 Sally      0
13 CA     2001 Mary       0
14 CA     2001 Kate       1
15 WI     2001 Jane       0
16 WI     2001 Linda      0
17 NY     2001 Carl       0
18 NY     2001 Jim        0

CodePudding user response:

A base R approach that leaves out NAs

df2 <- split(df, df$year)

cbind(df, change=rep((!(df2$"2000"$name == df2$"2001"$name))*1, length(df2)))
   state year  name change
1     CA 2000  John      1
2     CA 2000  Paul      0
3     WI 2000 Sally      0
4     WI 2000  Mary      0
5     NY 2000  Fred      1
6     NY 2000  Jane      0
7     CA 2000 Linda      0
8     CA 2000  Carl      0
9     WI 2000   Jim      0
10    WI 2001 Peter      1
11    NY 2001  Paul      0
12    NY 2001 Sally      0
13    CA 2001  Mary      0
14    CA 2001  Kate      1
15    WI 2001  Jane      0
16    WI 2001 Linda      0
17    NY 2001  Carl      0
18    NY 2001   Jim      0
  • Related