Home > Software engineering >  Removing entire observation with a condition
Removing entire observation with a condition

Time:03-26

I have made a survey and would like to remove all answers from people who answered ''no'' to being a parent?

The dataset is called ''tillid''

The variable is called ''Er du forældre'' and the answer is either ''Ja'' or ''Nej''

An overview with table

CodePudding user response:

Question

Welcome to SO. You should read the How do I ask a good question advice shared by @user2974951 to learn how to ask questions in a way that helps the community respond.

From the [r] tag guidance:

Please use minimal reproducible example(s) others can run using copy & paste. Show desired output. Use dput() for data & specify all non-base packages with library(). Don't embed pictures for data or code, use indented code blocks instead.

In this case you have a data.frame that looks something like this:

> Tillid

  Id ... Er du forældre
1  1                 Ja
2  2                Nej
3  3  
4  4                Nej
5  5                 Ja
...

To create a minimal reproducible example use dput on a subset of rows and columns:

> dput(Tillid[1:5, c('Id', 'Er du forældre')])

structure(list(Id = 1:5, `Er du forældre` = c("Ja", "Nej", "",
"Nej", "Ja")), class = "data.frame", row.names = c(NA, -5L))

Anyone can copy this line code and create a dataset that looks like yours.

Next you want to show what you are trying to achieve (based on the minimal example):

> <insert code here>

  Id ... Er du forældre
1  1                 Ja
3  3  
5  5                 Ja

Answer

In this case, the code to remove rows with the value 'Nej' is as follows (you might want to assign this to a new variable using <-)

> Tillid[Tillid$`Er du forældre` != 'Nej', ]

  Id Er du forældre
1  1             Ja
3  3
5  5             Ja

If you also want to exclude missing answers (of which you have 114), you could slice to only those rows with the value "Ja":

> Tillid[Tillid$`Er du forældre` == 'Ja', ]

  Id Er du forældre
1  1             Ja
5  5             Ja

As mentioned by @mat in his answer, it's good practice to avoid special characters, and spaces, in column names.

CodePudding user response:

tillid <- tillid[tillid$`Er du forældre` != "Nej"]

The != means "not equal to".

Alternatively:

tillid <- tillid[tillid$`Er du forældre` == "Ja"]

Note that if you have missing values (NA), the first alternative will preserve them, whereas the second option will exclude everything that is not equal to Ja.

I would suggest to avoid special characters (e.g., æ) in your variable names as this can cause some bugs in R.

  •  Tags:  
  • r
  • Related