How can I filter by the content within a row in R?-CodePudding

I'm having trouble with needing to filter based on the content within a row in R. For example, my first row of data has the values A, A, A, B, C, C and I want to subset only the columns that have A in the first row. I would usually rotate/transpose my .csv and use something like filter(dataset, col1 == "A" but my data has to be formatted* in a way that I would need to write something like filter(dataset, row1 == "A" which is invalid.

I've used rownames() to at least title row1, but still can't use that within filter()

I'm having some trouble with search terms for this (googling filter by row or filter by column gives me the same results), so any help is much appreciated!

Here is an example of what my data looks like:

      col1  col2  col3  col4  col5  col6 

row1     A     A     A     B     C     C    
row2     1     0     1     0     1     0   
row3     0     0     1     1     0     0
row4     1     1     1     1     1     0

And this is my desired subset:

      col1  col2  col3 

row1     A     A     A 
row2     1     0     1 
row3     0     0     1 
row4     1     1     1

I would rather not use slice() to get the only first 3 columns because I have to do this for the whole alphabet so I'm hoping to just swap out the A in my code for a B then C etc.

Please let me know if you have any questions and thank you in advance for your help!

*My data has to be formatted like this and not rotated/transposed even though that would be easier because I'm running it through agree() next from the irr package.

CodePudding user response：

Here is one option

df1[,df1["row1", ] == "A"]
#     col1 col2 col3
#row1    A    A    A
#row2    1    0    1
#row3    0    0    1
#row4    1    1    1

data

df1 <- read.table(
  text = "      col1  col2  col3  col4  col5  col6
row1     A     A     A     B     C     C
row2     1     0     1     0     1     0
row3     0     0     1     1     0     0
row4     1     1     1     1     1     0", header = TRUE
)

CodePudding user response：

A dplyr alternative could be

df %>% 
  select(which(.[1, ] == "B"))

#>   col4
#> 1    B
#> 2    0
#> 3    1
#> 4    1

CodePudding user response：

We could use select_if combined with first():

library(dplyr)

select_if(df, function(.) first(.) == "A")

     col1 col2 col3
row1    A    A    A
row2    1    0    1
row3    0    0    1
row4    1    1    1