I want to create a function that will give a new dataframe only with rows in which for the selected column the value is counted exactly 2 times in the original data.frame
I try this:
duplicates <- function(df$x, as.bool = TRUE) {
is.dup <- (duplicated(x) & rev(duplicated(rev(x))))
if (as.bool) { is.dup } else { x[is.dup] }
}
CodePudding user response:
In lack of data, I'm using the mtcars data. You can do:
duplicates <- function(data, var)
{
library(tidyverse)
data |>
add_count(!!sym(var)) |>
filter(n == 2) |>
select(-n)
}
duplicates(mtcars, "mpg")
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
6 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
7 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
8 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
9 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
10 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
11 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
12 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
13 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
14 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
In this case, each of the "mpg" values appears exactly two times in the data.
CodePudding user response:
oneDuplicate <- function(df, vec){
if(is.numeric(vec)){
ndf <- df[df[[vec]] %in% (which(table(df[[vec]]) == 2) |> names() |> as.numeric()),]
} else {
ndf <- df[df[[vec]] %in% (which(table(df[[vec]]) == 2) |> names()),]
}
return(ndf)
}
oneDuplicate(attitude, "advance")
63 64 51 54 63 73 47
5 81 78 56 66 71 83 47
6 43 55 49 44 54 49 34
11 64 53 53 58 58 67 34
15 77 77 54 72 79 77 46
19 65 70 46 57 75 85 46
21 50 40 33 34 43 64 33
24 40 37 42 58 50 57 49
25 63 54 42 48 66 75 33
27 78 75 58 74 80 78 49