my issue is I have a big database of 283 columns, some of which have the same name (for example, "uncultured").
Is there a way to select columns avoiding those with repeated names? Those (bacteria) normally have a very small abundance, so I don't really care for their contribution, I'd just like to take the columns with unique names.
My database is something like
Samples col1 col2 col3 col4 col2 col1....
S1
S2
S3
...
and I'd like to select every column but the second col2
and col1
.
Thanks!
CodePudding user response:
Something like this should work:
df[, !duplicated(colnames(df))]
CodePudding user response:
Like this you will automatically select the first column with a unique name:
df[unique(colnames(df))]
#> col1 col2 col3 col4 S1 S2 S3
#> 1 1 2 3 4 7 8 9
#> 2 1 2 3 4 7 8 9
#> 3 1 2 3 4 7 8 9
#> 4 1 2 3 4 7 8 9
#> 5 1 2 3 4 7 8 9
Reproducible example
df
is defined as:
df <- as.data.frame(matrix(rep(1:9, 5), ncol = 9, byrow = TRUE))
colnames(df) <- c("col1", "col2", "col3", "col4", "col2", "col1", "S1", "S2", "S3")
df
#> col1 col2 col3 col4 col2 col1 S1 S2 S3
#> 1 1 2 3 4 5 6 7 8 9
#> 2 1 2 3 4 5 6 7 8 9
#> 3 1 2 3 4 5 6 7 8 9
#> 4 1 2 3 4 5 6 7 8 9
#> 5 1 2 3 4 5 6 7 8 9