How to get the values that appear only twice in a column-CodePudding

I have a data frame df with multiple columns, lets say x,y and z.

x	y	z
43	5666	654
54	545	645
43	7864	654
25	65654	987
21	5445	789
67	98	89
25	64	986
78	6465	68
21	546	64

Now I need only those rows for which the values in column x appear twice

x	y	z
43	5666	654
43	7864	654
25	65654	987
21	5445	789
25	64	986
21	546	64

I have been looking around but found a code for python but not R.

CodePudding user response：

A dplyr way:

set.seed(100)
library(dplyr)

df <- data.frame(x = sample(1:5, 10, T),
                 y = sample(10:15, 10, T),
                 z = sample(20:25, 10, T))

df
#>    x  y  z
#> 1  2 13 23
#> 2  3 12 24
#> 3  1 12 22
#> 4  2 11 20
#> 5  4 10 22
#> 6  4 11 23
#> 7  2 12 21
#> 8  3 13 25
#> 9  2 13 24
#> 10 5 15 24

df |> 
  group_by(x) |> 
  filter(n() == 2)
#> # A tibble: 4 × 3
#> # Groups:   x [2]
#>       x     y     z
#>   <int> <int> <int>
#> 1     3    12    24
#> 2     4    10    22
#> 3     4    11    23
#> 4     3    13    25

^{Created on 2022-10-03 with reprex v2.0.2}

CodePudding user response：

Using ave.

dat[with(dat, ave(x, x, FUN=length) == 2), ]
#    x     y   z
# 1 43  5666 654
# 3 43  7864 654
# 4 25 65654 987
# 5 21  5445 789
# 7 25    64 986
# 9 21   546  64

Data:

dat <- structure(list(x = c(43L, 54L, 43L, 25L, 21L, 67L, 25L, 78L, 
21L), y = c(5666L, 545L, 7864L, 65654L, 5445L, 98L, 64L, 6465L, 
546L), z = c(654L, 645L, 654L, 987L, 789L, 89L, 986L, 68L, 64L
)), class = "data.frame", row.names = c(NA, -9L))

CodePudding user response：

set.seed(100)
df <- data.frame(x = sample(1:5, 10, T),
             y = sample(10:15, 10, T),
             z = sample(20:25, 10, T))
## n -> rows with values appeared twice
n <- which(df$x %in% as.numeric(names(which(table(df$x) == 2)))) 
df[n,]