I have a dataframe with a bunch of abundance values for species as well as metadata combined. I use the following code below to delete all species whose abundances are less than 1, but the issue I am having is I can't figure out how to ignore the metadata column where the longitude data is also less than 1. I would like to keep that column and just focus abundance values (like the last 10 columns of the dataframe are abundances, the first 5 are metadata values that should be left untouched).
Here is an example of my dataframe (mini version):
site <- c("S1","S2","S3")
lat <- c(30,30,30.1)
long <- c(-43.11,-42.23,-42.10)
sp1 <- c(0,0,0)
sp2 <- c(10,4,9)
sp3 <- c(1,1,2)
x <- data.frame(site,lat,long,sp1,sp2,sp3)
site latitude longitude sp1 sp2 sp3
1 S1 30 -43.11 0 10 1
2 S2 30 -42.23 0 4 1
3 S3 30.1 -42.10 0 9 2
I just need to grab all columns for species abundances that sum up to 0 and remove them. I used:
x <- x[,colSums(x[,4:ncol(x)]) > 0]
x
lat long sp2 sp3
1 30.0 -43.11 10 1
2 30.0 -42.23 4 1
3 30.1 -42.10 9 2
But I can't get it to return the "site" column with this...probably because it is a character column that I need to keep. Is there no way to have the table just drop the columns I am subsetting but leave everything else alone?
My goal is to return the following:
site lat long sp2 sp3
1 S1 30.0 -43.11 10 1
2 S2 30.0 -42.23 4 1
3 S3 30.1 -42.10 9 2
Only sp1 is dropped because it's sum was 0.
CodePudding user response:
We may use select
library(dplyr)
x %>%
select(all_of(names(.)[1:3]), where(~ is.numeric(.) &&
sum(., na.rm = TRUE) > 0))
-output
site lat long sp2 sp3
1 S1 30.0 -43.11 10 1
2 S2 30.0 -42.23 4 1
3 S3 30.1 -42.10 9 2
In the OP's code, just add 3 TRUE to concatenate with the logical output based on the conversion of colSums
to logical vector
x[c(rep(TRUE, 3), colSums(x[,4:ncol(x)]) > 0)]
site lat long sp2 sp3
1 S1 30.0 -43.11 10 1
2 S2 30.0 -42.23 4 1
3 S3 30.1 -42.10 9 2