Home > Software design >  Extract element from vector using for loops and dataframe
Extract element from vector using for loops and dataframe

Time:10-27

I checked answers in

How to extract specific element from vector using for loop in R

But it's not what i want I have data contains 17 rows and variables

I have vector that a generated every time using for loops and it contains names of variables ( diffrent names depending of conditions inside the for loop)

My aim is to eliminate names of variables in every vector except the one that has the highest sum

For example i have this dataframe :

my_data >

NAMES         A       B       C    D    E      F 
One           1       2       3    4    5      6
Two           2       3       4    5    6      7
THREE         3       4       5    6    7      8
FOUR          4       5       6    7    8      9
FIVE          5       6       7    8    9     10
SIX           6       7       8    9    10    11

Let's say that the first vector generated by for loop contain names :

vec >
 "B" "C" "D"

So using these variable the program will eliminate "B" and "C" because D is the one that has the highes sum :

So i will obtain

New_data 

        NAMES     A    D    E      F 
        One       1    4    5      6
        Two       2    5    6      7
        THREE     3    6    7      8
        FOUR      4    7    8      9
        FIVE      5    8    9     10
        SIX       6    9    10    11

Let's say the second vector contain these names "A" , "E" so the program will eliminate the A because E is the variable that has the highest sum

So

New data >

NAMES        D    E      F 
One           4    5     6
Two           5    6     7
THREE         6    7     8
FOUR          7    8     9
FIVE          8    9     10
SIX           9    10    11

Let's say that the third vector conatin "E" and "F"

Here's the part of vector analze programe code i used :

     #This is how i generated the vector 
     vec <- names(Filter(function(x) x > 0, rowSums(tmp) > 0 | colSums(tmp) > 0))
      my_data %>%                 
        dplyr::select(all_of(vec)) %>% # select vector items
        slice(-17) %>% # remove 17 line
        map_dbl(sum) %>% # make sum
        which.max() %>% # select max
        names() -> selected # select max name
        #in the variable selected i have the name of variable i should keep
        
        my_data %>% dplyr::select(!vec,selected) -> new_data# select columns  
        
    }

The problem with this program is that in the end my new_data contain all the variables except the last comparaison, because it uses always my data so in the last comparaison it compares the variables in my last vector and it keeps all the variables in my_data in new_data except the variables in my last vector that doesn't have the highest sum

So continue on the example i started before : let's say the third vector conatin "E" and "F" :

The result i need to obtain is :

New data >

NAMES         D        F 
One           4        6
Two           5        7
THREE         6        8
FOUR          7        9
FIVE          8        10
SIX           9        11

#I eliminated E because F has the highes sum

But the program i wrote give me this result :

   NAMES          A       B       C        D      F 
    One           1       2       3        5      6
    Two           2       3       4        6      7
    THREE         3       4       5        7      8
    FOUR          4       5       6        8      9
    FIVE          5       6       7        9     10
    SIX           6       7       8        10    11

I think because the program took informations from my first data and it keeps all teh variables that are not in the my vector (that's why in the last comparaison it keeps A B C D )

So now i don't know how to fix this problem

please tell me if you need more informations

CodePudding user response:

I don't know what you are doing, so here is an alternative.

tmp=replicate(5,{sample(LETTERS[1:10],3)},simplify=F)

[[1]]
[1] "J" "C" "A"

[[2]]
[1] "F" "D" "B"

[[3]]
[1] "C" "G" "H"

[[4]]
[1] "J" "F" "C"

[[5]]
[1] "H" "G" "J"

I made up these vectors of column names, because I don't know how you generate them. Then we iterate this object and remove the columns.

for (i in tmp) {
  # your stuff here
  df=df[,!colnames(df) %in% i]
}

  NAMES  E
1   One  5
2   Two  6
3 THREE  7
4  FOUR  8
5  FIVE  9
6   SIX 10

CodePudding user response:

You may try this option -

for(i in vec) {
  #Get the column names to delete based on column sum
  drop_columns <- i[-which.max(colSums(my_data[i]))]
  my_data[drop_columns] <- NULL
}

#  NAMES D  F
#1   One 4  6
#2   Two 5  7
#3 THREE 6  8
#4  FOUR 7  9
#5  FIVE 8 10
#6   SIX 9 11

data

my_data <- structure(list(NAMES = c("One", "Two", "THREE", "FOUR", "FIVE", 
"SIX"), A = 1:6, B = 2:7, C = 3:8, D = 4:9, E = 5:10, F = 6:11), 
class = "data.frame", row.names = c(NA, -6L))

vec <- list(c('B', 'C', 'D'), c('A', 'E'), c('E', 'F'))
  • Related