Home > front end >  Remove pivot_longer ids that have multiple values in R
Remove pivot_longer ids that have multiple values in R

Time:01-05

I have the following dataframe

  color        sex    id     var  value
1 blue           0     1 dog_clm     NA
2 blue           0     1 dog_clm     NA
3 blue           0     1 dog_clm     1
4 blue           1     2 dog_clm     NA
5 blue           1     2 dog_clm     2
6 blue           1     2 dog_clm     NA
7 blue           1     3 dog_clm     NA
8 blue           1     3 dog_clm     6
9 blue           1     3 dog_clm     NA
10 blue          1     4 dog_clm     7
11 blue          1     4 dog_clm     NA
12 blue          1     4 dog_clm     1

I want to remove the ids where there are multiple values. How can I do this? Expected output would remove id 4 because it has values of 7 and 1.

Expected output

  color        sex    id     var  value
1 blue           0     1 dog_clm     NA
2 blue           0     1 dog_clm     NA
3 blue           0     1 dog_clm     1
4 blue           1     2 dog_clm     NA
5 blue           1     2 dog_clm     2
6 blue           1     2 dog_clm     NA
7 blue           1     3 dog_clm     NA
8 blue           1     3 dog_clm     6
9 blue           1     3 dog_clm     NA

CodePudding user response:

Not sure why CIAndrews deleted their answer, but using their sample data with the extra all-NA group added:

df %>%
  group_by(id) %>% 
  filter(n_distinct(value, na.rm = TRUE) <= 1)
# # A tibble: 12 × 5
# # Groups:   id [4]
#    color   sex    id var     value
#    <chr> <dbl> <int> <chr>   <dbl>
#  1 blue      0     1 dog_clm    NA
#  2 blue      0     1 dog_clm    NA
#  3 blue      0     1 dog_clm     1
#  4 blue      1     2 dog_clm    NA
#  5 blue      1     2 dog_clm     2
#  6 blue      1     2 dog_clm    NA
#  7 blue      1     3 dog_clm    NA
#  8 blue      1     3 dog_clm     6
#  9 blue      1     3 dog_clm    NA
# 10 blue      1     5 dog_clm    NA
# 11 blue      1     5 dog_clm    NA
# 12 blue      1     5 dog_clm    NA

Sample Data:

df <- structure(list(color = c("blue", "blue", "blue", "blue", "blue", 
"blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", 
"blue", "blue"), sex = c(0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1), id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 
4L, 5L, 5L, 5L), var = c("dog_clm", "dog_clm", "dog_clm", "dog_clm", 
"dog_clm", "dog_clm", "dog_clm", "dog_clm", "dog_clm", "dog_clm", 
"dog_clm", "dog_clm", "dog_clm", "dog_clm", "dog_clm"), value = c(NA, 
NA, 1, NA, 2, NA, NA, 6, NA, 7, NA, 1, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-15L))

CodePudding user response:

Here's what I'd do:

  1. Group by id as the other comments have suggested.
  2. Find all the duplicate values, including the 1st instance. The base R function duplicated will find any instances where some value already exists but it won't tag the 1st unless you also include the argument fromLast = TRUE, in which case it will ignore the last duplicate. You can use both, though, together in an "or" statement.

Here's how I'd do this using a tidyverse approach:

   MyDataFrame <- MyDataFrame %>% 
      group_by(id) %>%
      filter(which(duplicated(value) | duplicated(value, fromLast = TRUE))

That should include NA values the way you want.

  •  Tags:  
  • Related