Home > other >  create a new column with the dupicated values in both columns in R
create a new column with the dupicated values in both columns in R

Time:02-01

I inputted my file in R and obtained as data.frame.

x <- read_xlsx ("C:/Users/gtutk/Desktop/example.xlsx")
df <- data.frame (x)
    x  y
1   1  1
2   2  2
3   3  3
4   4  5
5   5  6
6   6  7
7   7  9
8   8 10
9   9 11
10 10 13
11 11 14
12 12 15
13 13 17
14 14 18
15 15 19
16 16 14
17 17 15
18 18 17
19 19 18
20 12 19
21 13 20
22 14 21
23 15 22
24 16 23
25 17 24
26 18 25
27 19 26

I want to creat a new merged excel file with duplicated values in two columns. Or removing the unique values existing in both two columns and finally getting only duplicated values in both two columns. Expected file:

    new
1    1
2    2
3    3
4    5
5    6
6    7
7    9
8   10
9   11
10  13
11  14
12  15
13  17
14  18
15  19

CodePudding user response:

# If you have exactly two columns:
data.frame(new = do.call(intersect, df))
# For more than 2 columns:
data.frame(new = Reduce(intersect, df))

   new
1    1
2    2
3    3
4    5
5    6
6    7
7    9
8   10
9   11
10  13
11  14
12  15
13  17
14  18
15  19

Reproducible input data:

data.frame(
  x = c(
    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 12, 13, 
    14, 15, 16, 17, 18, 19
  ),
  y = c(
    1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15, 17, 18, 19, 14, 15, 17, 18, 19, 20,
    21, 22, 23, 24, 25, 26
  ) 
)

CodePudding user response:

We can try the following code, which can deal with the case of more than 2 columns

> d <- table(stack(df))

> data.frame(new = as.numeric(row.names(d))[rowSums(d > 0) == length(df)])
   new
1    1
2    2
3    3
4    5
5    6
6    7
7    9
8   10
9   11
10  13
11  14
12  15
13  17
14  18
15  19

CodePudding user response:

I have added extra columns, z and w, to show how to use this approach when there is more than 2 columns:

library(tidyverse)

df <- structure(list(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 
14, 15, 16, 17, 18, 19, 12, 13, 14, 15, 16, 17, 18, 19), y = c(1, 
2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15, 17, 18, 19, 14, 15, 17, 
18, 19, 20, 21, 22, 23, 24, 25, 26)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -27L), spec = structure(list(
    cols = list(x = structure(list(), class = c("collector_double", 
    "collector")), y = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

df$z <- rep(12, nrow(df))
df$w <-  rep(8, nrow(df))

df %>% 
  transmute(new = if_else(x %in% as.vector(unlist(.[,-1])), x, NA_real_)) %>% 
  drop_na(new) %>% distinct

#>    new
#> 1    1
#> 2    2
#> 3    3
#> 4    5
#> 5    6
#> 6    7
#> 7    8
#> 8    9
#> 9   10
#> 10  11
#> 11  12
#> 12  13
#> 13  14
#> 14  15
#> 15  17
#> 16  18
#> 17  19
  •  Tags:  
  • Related