my issue is I have a big database of 283 columns, some of which have the same name (for example, "uncultured").
Is there a way to select columns avoiding those with repeated names? Those (bacteria) normally have a very small abundance, so I don't really care for their contribution, I'd just like to take the columns with unique names.

My database is something like

    Samples  col1  col2  col3  col4 col2 col1.... 
S1
S2
S3
...

and I'd like to select every column but the second col2 and col1.

Thanks!

CodePudding user response：

Something like this should work:

df[, !duplicated(colnames(df))]

CodePudding user response：

Like this you will automatically select the first column with a unique name:

df[unique(colnames(df))]
#>   col1 col2 col3 col4 S1 S2 S3
#> 1    1    2    3    4  7  8  9
#> 2    1    2    3    4  7  8  9
#> 3    1    2    3    4  7  8  9
#> 4    1    2    3    4  7  8  9
#> 5    1    2    3    4  7  8  9

Reproducible example

df is defined as:

df <- as.data.frame(matrix(rep(1:9, 5), ncol = 9, byrow = TRUE))
colnames(df) <- c("col1", "col2", "col3", "col4", "col2", "col1", "S1", "S2", "S3")
df
#>   col1 col2 col3 col4 col2 col1 S1 S2 S3
#> 1    1    2    3    4    5    6  7  8  9
#> 2    1    2    3    4    5    6  7  8  9
#> 3    1    2    3    4    5    6  7  8  9
#> 4    1    2    3    4    5    6  7  8  9
#> 5    1    2    3    4    5    6  7  8  9