Subsetting dataframe while using conditional statements and column names? What am I missing?-CodePudding

I am trying to create a new dataframe by extracting a column (EMO_ETA.3) for specific rows (zyg==1) from an original dataframe. I would like to write a piece of code that refers to this column by its name, not the index. Can someone help me understand why the bottom piece of code doesn't work and would be the correct way to do it?

new <- original[original$zyg==1][,c(3)] # works
new <- original[original$zyg==1][,c("EMO_ETA.3")] # doesn't work

    ID zyg EMO_ETA.3 ACT_ETA.3
1 2330   2    -2.693     2.359
2 2331   1    -1.029    -0.286
3 2333   1     0.203     0.938
4 2334   2    -0.853    -0.405
5 2336   1    -0.969    -2.122
6 2337   2    -0.956    -1.026

CodePudding user response：

A solution would be to use some functions from dplyr:

library(dplyr)
new <- original %>% filter(zyg == 1) %>% select(EMO_ETA.3)

As for the use of column names within brackets, I believe that generally it is best practice to use the numbered index, and that if referencing the column names, especially without a $ indicator, you would have to use functions such as those found within dplyr.

CodePudding user response：

You first filter your data by row and you need adding comma after filtering for 'zyg' values. Then you want to filter variables and no need to add comma in this case. The resulting output is a df and not a vector. here is the code:

original[original$zyg==1,]['EMO_ETA.3']

hope thats what you are looking for.