Home > Blockchain >  R Recode variable for all observations that do not occur more than once
R Recode variable for all observations that do not occur more than once

Time:12-13

I have a simple dataframe that looks like the following:

Observation X1 X2 Group
1           2   4   1
2           6   3   2
3           8   4   2
4           1   3   3
5           2   8   4
6           7   5   5
7           2   4   5

How can I recode the group variable such that all non-recurrent observations are recoded as "unaffiliated"?

The desired output would be the following:

Observation X1 X2 Group
1           2   4   Unaffiliated
2           6   3   2
3           8   4   2
4           1   3   Unaffiliated
5           2   8   Unaffiliated
6           7   5   5
7           2   4   5

CodePudding user response:

unfaffil takes a vector of Group numbers and returns "Unaffiliated" if it has one element and otherwise returns the input. We can then apply it by Group using ave. This does not overwrite the input. No packages are used but if you use dplyr then transform can be replaced with mutate.

unaffil <- function(x) if (length(x) == 1) "Unaffiliated" else x
transform(dat, Group = ave(Group, Group, FUN = unaffil))

giving

  Observation X1 X2        Group
1           1  2  4 Unaffiliated
2           2  6  3            2
3           3  8  4            2
4           4  1  3 Unaffiliated
5           5  2  8 Unaffiliated
6           6  7  5            5
7           7  2  4            5

Note

dat <- structure(list(Observation = 1:7, X1 = c(2L, 6L, 8L, 1L, 2L, 
7L, 2L), X2 = c(4L, 3L, 4L, 3L, 8L, 5L, 4L), Group = c(1L, 2L, 
2L, 3L, 4L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))

CodePudding user response:

We may use duplicated to create a logical vector for non-duplicates and assign the 'Group' to Unaffiliated for those non-duplicates

df1$Group[with(df1, !(duplicated(Group)|duplicated(Group, 
     fromLast = TRUE)))] <- "Unaffiliated"

-output

> df1
  Observation X1 X2        Group
1           1  2  4 Unaffiliated
2           2  6  3            2
3           3  8  4            2
4           4  1  3 Unaffiliated
5           5  2  8 Unaffiliated
6           6  7  5            5
7           7  2  4            5

data

df1 <- structure(list(Observation = 1:7, X1 = c(2L, 6L, 8L, 1L, 2L, 
7L, 2L), X2 = c(4L, 3L, 4L, 3L, 8L, 5L, 4L), Group = c(1L, 2L, 
2L, 3L, 4L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))

CodePudding user response:

One way could be first grouping then checking for maximum of row number and finishing with an ifelse:

library(dplyr)

df %>% 
  group_by(Group) %>% 
  mutate(Group = ifelse(max(row_number()) == 1, "Unaffiliated", as.character(Group))) %>% 
  ungroup()
  Observation    X1    X2 Group       
        <int> <int> <int> <chr>       
1           1     2     4 Unaffiliated
2           2     6     3 2           
3           3     8     4 2           
4           4     1     3 Unaffiliated
5           5     2     8 Unaffiliated
6           6     7     5 5           
7           7     2     4 5    
  • Related