Home > database >  Why is my attempt to double up on indexing failing?
Why is my attempt to double up on indexing failing?

Time:09-14

What are the names of the female pirates who like superman. Why is "Babice" included in the answer when she doesn't like superman (but rather Antman)?

piratesurvey2 <- data.frame(
  name = c("Astrid", "Lea", "Sarina", "Remon", "Letizia", "Babice", "Jonas", "Wendy", "Niveditha", "Gioia"),
  sex = c("F", "F", "F", "M", "F", "F", "M", "F", "F", "F"),
  superhero = c("Batman", "Superman", "Batman", "Spiderman", "Batman",
                "Antman", "Batman", "Superman", "Maggott", "Superman"),
  stringsAsFactors = FALSE)

> piratesurvey2$name[(piratesurvey$superhero == "Superman")[piratesurvey$sex == "F"]]
[1] "Lea"    "Babice" "Wendy"  "Gioia" 

> (piratesurvey$superhero == "Superman")[piratesurvey$sex == "F"]
[1] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE

CodePudding user response:

The issue is with (piratesurvey$superhero == "Superman")[piratesurvey$sex == "F"]

The first part of this condition finds which rows contain someone that likes superheros with (piratesurvey$superhero == "Superman") - the 2nd, 8th and 10th rows. So the output is a vector of length 10, with TRUE in the 2nd, 8th and 10th positions.

You are then subsetting this where the rows in the original dataframe are female (i.e. saying take the values for all except the 4th and 7th). This means you are taking the 10 element vector above and removing the 4th and 7th elements, so your remaining vector has length 8, with TRUE in the 2nd, 6th and 8th positions.

So when you try to subset piratesurvey2$name, you are passing a vector of length 8 that has TRUE in the 2nd, 6th and 8th elements. The corresponding names are 'Lea' 'Babice' and 'Wendy'. Because the vector is shorter than the length of piratesurvey2$name, it is recycled so the 10th (8 2) element is also selected as TRUE. So you are left with the 2nd, 6th, 8th and 10th elements of the piratesurvey2$name.

In short, what you want is (piratesurvey$superhero == "Superman") & piratesurvey$sex == "F" rather than (piratesurvey$superhero == "Superman")[piratesurvey$sex == "F"].

CodePudding user response:

Try the following to conditionally select as described:

piratesurvey2[piratesurvey2$sex == "F" & piratesurvey2$superhero == "Superman", ]$name

[1] "Lea"   "Wendy" "Gioia"
  •  Tags:  
  • r
  • Related