I'm having trouble with needing to filter based on the content within a row in R. For example, my first row of data has the values A, A, A, B, C, C and I want to subset only the columns that have A in the first row. I would usually rotate/transpose my .csv and use something like filter(dataset, col1 == "A"
but my data has to be formatted* in a way that I would need to write something like filter(dataset, row1 == "A"
which is invalid.
I've used rownames() to at least title row1, but still can't use that within filter()
I'm having some trouble with search terms for this (googling filter by row or filter by column gives me the same results), so any help is much appreciated!
Here is an example of what my data looks like:
col1 col2 col3 col4 col5 col6
row1 A A A B C C
row2 1 0 1 0 1 0
row3 0 0 1 1 0 0
row4 1 1 1 1 1 0
And this is my desired subset:
col1 col2 col3
row1 A A A
row2 1 0 1
row3 0 0 1
row4 1 1 1
I would rather not use slice() to get the only first 3 columns because I have to do this for the whole alphabet so I'm hoping to just swap out the A in my code for a B then C etc.
Please let me know if you have any questions and thank you in advance for your help!
*My data has to be formatted like this and not rotated/transposed even though that would be easier because I'm running it through agree() next from the irr package.
CodePudding user response:
Here is one option
df1[,df1["row1", ] == "A"]
# col1 col2 col3
#row1 A A A
#row2 1 0 1
#row3 0 0 1
#row4 1 1 1
data
df1 <- read.table(
text = " col1 col2 col3 col4 col5 col6
row1 A A A B C C
row2 1 0 1 0 1 0
row3 0 0 1 1 0 0
row4 1 1 1 1 1 0", header = TRUE
)
CodePudding user response:
A dplyr
alternative could be
df %>%
select(which(.[1, ] == "B"))
#> col4
#> 1 B
#> 2 0
#> 3 1
#> 4 1
CodePudding user response:
We could use select_if
combined with first()
:
library(dplyr)
select_if(df, function(.) first(.) == "A")
col1 col2 col3
row1 A A A
row2 1 0 1
row3 0 0 1
row4 1 1 1