I've got a big data frame, and like to remove the duplicate column
For simplicity, let's pretend this is my data:
df <- data.frame(id1 = c("Aa","Aa","Ba","Ca","Da"), id2 = c(2,1,4,5,10), location=c(351,261,101,91,51), comment=c(35,26,10,9,5), comment=c(5,16,25,14,11), hight=c(15,21,5,19,18), check.names = FALSE)
I can remove the duplicate column name "comment" using:
df <- df[!duplicated(colnames(df))]
However, when I apply same code in my real dataframe it returns an error:
Error in `[.data.table`(SNV_wild, !duplicated(colnames(SNV_wild))) :
i evaluates to a logical vector length 1883 but there are 60483 rows. Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. Explicitly use rep(...,length=.N) if you really need to recycle.
Sorry, I can't post real data since it is quite large which you can see in error.
How can I troubleshoot this - I have gone through all columns names and there are duplicate column name.
Thank you in advance
CodePudding user response:
Your real dataframe is of class data.table
, while your small example is not. You can try:
df[,!duplicated(colnames(df)), with=F]