I checked answers in
How to extract specific element from vector using for loop in R
But it's not what i want I have data contains 17 rows and variables
I have vector that a generated every time using for loops and it contains names of variables ( diffrent names depending of conditions inside the for loop)
My aim is to eliminate names of variables in every vector except the one that has the highest sum
For example i have this dataframe :
my_data >
NAMES A B C D E F
One 1 2 3 4 5 6
Two 2 3 4 5 6 7
THREE 3 4 5 6 7 8
FOUR 4 5 6 7 8 9
FIVE 5 6 7 8 9 10
SIX 6 7 8 9 10 11
Let's say that the first vector generated by for loop contain names :
vec >
"B" "C" "D"
So using these variable the program will eliminate "B" and "C" because D is the one that has the highes sum :
So i will obtain
New_data
NAMES A D E F
One 1 4 5 6
Two 2 5 6 7
THREE 3 6 7 8
FOUR 4 7 8 9
FIVE 5 8 9 10
SIX 6 9 10 11
Let's say the second vector contain these names "A" , "E" so the program will eliminate the A because E is the variable that has the highest sum
So
New data >
NAMES D E F
One 4 5 6
Two 5 6 7
THREE 6 7 8
FOUR 7 8 9
FIVE 8 9 10
SIX 9 10 11
Let's say that the third vector conatin "E" and "F"
Here's the part of vector analze programe code i used :
#This is how i generated the vector
vec <- names(Filter(function(x) x > 0, rowSums(tmp) > 0 | colSums(tmp) > 0))
my_data %>%
dplyr::select(all_of(vec)) %>% # select vector items
slice(-17) %>% # remove 17 line
map_dbl(sum) %>% # make sum
which.max() %>% # select max
names() -> selected # select max name
#in the variable selected i have the name of variable i should keep
my_data %>% dplyr::select(!vec,selected) -> new_data# select columns
}
The problem with this program is that in the end my new_data contain all the variables except the last comparaison, because it uses always my data so in the last comparaison it compares the variables in my last vector and it keeps all the variables in my_data in new_data except the variables in my last vector that doesn't have the highest sum
So continue on the example i started before : let's say the third vector conatin "E" and "F" :
The result i need to obtain is :
New data >
NAMES D F
One 4 6
Two 5 7
THREE 6 8
FOUR 7 9
FIVE 8 10
SIX 9 11
#I eliminated E because F has the highes sum
But the program i wrote give me this result :
NAMES A B C D F
One 1 2 3 5 6
Two 2 3 4 6 7
THREE 3 4 5 7 8
FOUR 4 5 6 8 9
FIVE 5 6 7 9 10
SIX 6 7 8 10 11
I think because the program took informations from my first data and it keeps all teh variables that are not in the my vector (that's why in the last comparaison it keeps A B C D )
So now i don't know how to fix this problem
please tell me if you need more informations
CodePudding user response:
I don't know what you are doing, so here is an alternative.
tmp=replicate(5,{sample(LETTERS[1:10],3)},simplify=F)
[[1]]
[1] "J" "C" "A"
[[2]]
[1] "F" "D" "B"
[[3]]
[1] "C" "G" "H"
[[4]]
[1] "J" "F" "C"
[[5]]
[1] "H" "G" "J"
I made up these vectors of column names, because I don't know how you generate them. Then we iterate this object and remove the columns.
for (i in tmp) {
# your stuff here
df=df[,!colnames(df) %in% i]
}
NAMES E
1 One 5
2 Two 6
3 THREE 7
4 FOUR 8
5 FIVE 9
6 SIX 10
CodePudding user response:
You may try this option -
for(i in vec) {
#Get the column names to delete based on column sum
drop_columns <- i[-which.max(colSums(my_data[i]))]
my_data[drop_columns] <- NULL
}
# NAMES D F
#1 One 4 6
#2 Two 5 7
#3 THREE 6 8
#4 FOUR 7 9
#5 FIVE 8 10
#6 SIX 9 11
data
my_data <- structure(list(NAMES = c("One", "Two", "THREE", "FOUR", "FIVE",
"SIX"), A = 1:6, B = 2:7, C = 3:8, D = 4:9, E = 5:10, F = 6:11),
class = "data.frame", row.names = c(NA, -6L))
vec <- list(c('B', 'C', 'D'), c('A', 'E'), c('E', 'F'))