Removing duplicate values from each column of dataframe in R-CodePudding

I need to remove duplicate values from each column of my dataframe. I built this minimum example

library(dplyr)
library(data.table)

df <- data_frame(
  temp = c(21, 22, 22, 24, 25, 25, 26, 27),
  prec = c(222.34, 550, 550, 659.43, 700, 700, 750.500, 790),
  alt = c(400, 400, 500, 650, 700, 750, 750, 800)
)

So, how can I remove repeated values from each column?

Thanks in advance!

CodePudding user response：

You could use apply() to do this:

library(dplyr)
df <- data_frame(
  temp = c(21, 22, 22, 24, 25, 25, 26, 27),
  prec = c(222.34, 550, 550, 659.43, 700, 700, 750.500, 790),
  alt = c(400, 400, 550, 650, 700, 750, 750, 800)
)

apply(df, 2, unique)
#>      temp   prec alt
#> [1,]   21 222.34 400
#> [2,]   22 550.00 550
#> [3,]   24 659.43 650
#> [4,]   25 700.00 700
#> [5,]   26 750.50 750
#> [6,]   27 790.00 800

The above generated a matrix because there are 6 unique values in each column. This may not always be the case. I modified the example a bit to show what happens when this isn't the case. Here, you get a list instead of a matrix because the variables are not all the same length.

df <- data_frame(
  temp = c(21, 22, 22, 24, 25, 25, 26, 27),
  prec = c(222.34, 550, 550, 659.43, 700, 700, 750.500, 790),
  alt = c(400, 400, 400, 650, 700, 750, 750, 800)
)

apply(df, 2, unique)
#> $temp
#> [1] 21 22 24 25 26 27
#> 
#> $prec
#> [1] 222.34 550.00 659.43 700.00 750.50 790.00
#> 
#> $alt
#> [1] 400 650 700 750 800

^{Created on 2022-05-17 by the reprex package (v2.0.1)}

CodePudding user response：

The problem is, if you remove duplicate values from each column independently, you're not guaranteed to have columns of same size, and then you can't have a dataframe anymore.

But supposing the number of duplicate values is always constant, it would be very easy:

df2 <- data_frame(
  temp = unique(df$temp),
  prec = unique(df$prec),
  alt = unique(df$alt)
)