I need to remove duplicate values from each column of my dataframe. I built this minimum example
library(dplyr)
library(data.table)
df <- data_frame(
temp = c(21, 22, 22, 24, 25, 25, 26, 27),
prec = c(222.34, 550, 550, 659.43, 700, 700, 750.500, 790),
alt = c(400, 400, 500, 650, 700, 750, 750, 800)
)
So, how can I remove repeated values from each column?
Thanks in advance!
CodePudding user response:
You could use apply()
to do this:
library(dplyr)
df <- data_frame(
temp = c(21, 22, 22, 24, 25, 25, 26, 27),
prec = c(222.34, 550, 550, 659.43, 700, 700, 750.500, 790),
alt = c(400, 400, 550, 650, 700, 750, 750, 800)
)
apply(df, 2, unique)
#> temp prec alt
#> [1,] 21 222.34 400
#> [2,] 22 550.00 550
#> [3,] 24 659.43 650
#> [4,] 25 700.00 700
#> [5,] 26 750.50 750
#> [6,] 27 790.00 800
The above generated a matrix because there are 6 unique values in each column. This may not always be the case. I modified the example a bit to show what happens when this isn't the case. Here, you get a list instead of a matrix because the variables are not all the same length.
df <- data_frame(
temp = c(21, 22, 22, 24, 25, 25, 26, 27),
prec = c(222.34, 550, 550, 659.43, 700, 700, 750.500, 790),
alt = c(400, 400, 400, 650, 700, 750, 750, 800)
)
apply(df, 2, unique)
#> $temp
#> [1] 21 22 24 25 26 27
#>
#> $prec
#> [1] 222.34 550.00 659.43 700.00 750.50 790.00
#>
#> $alt
#> [1] 400 650 700 750 800
Created on 2022-05-17 by the reprex package (v2.0.1)
CodePudding user response:
The problem is, if you remove duplicate values from each column independently, you're not guaranteed to have columns of same size, and then you can't have a dataframe anymore.
But supposing the number of duplicate values is always constant, it would be very easy:
df2 <- data_frame(
temp = unique(df$temp),
prec = unique(df$prec),
alt = unique(df$alt)
)