I'm tryng to subset my total data (including all the other varibales) to an interval of zipcodes EXCLUDING a certain part of that interval. Quite new to R and can't get it to work. (Zipcode = postnr)
I have over 100 000 zipcodes (postnr) and want all values for individs in zipcode 10 000-12 999 and 15 600 - 16 800 in my dataset
Attempt 1
Datan <- subset(Data2, Data2$postnr >= 10000 & Data2$postnr <= 16880)
Datant <- subset(Datan, Datan$postnr >= 15600 & Datan$postnr < 13000)
Datan returns 31 3000 obs in 26 variabels and Datant returns 0 obs in 26 variabels..
Attempt 2
attach(Data2)
Data5 <- Data2 %>% filter(between(postnr, 10000, 12999) & between(postnr, 15600, 16880))
Data 5 returns 0 obsverations...
I have thousands of values for all my variables inside those intervals. What am I doing wrong?
CodePudding user response:
If you think about and versus or you have gotten it. As it is, you're really close!
Can a number be between 1 and 2 and 3 and 5? Nope. But if I said, can a number be between 1 and 2 or 3 and 5? Yup.
For subset
:
Datan <- subset(Data2, postnr >= 10000 & postnr <= 16800 |
postnr >= 15600 & postnr < 13000)
Where that verticle pipe: | means 'or'.
For dplyr
:
(I assume it's dplyr
with filter
.) You don't need to attach
the data, it will extract the variable names from Data2
if it's in the pipe (which it is).
Data5 <- Data2 %>% filter(between(postnr, 10000, 12999) |
between(postnr, 15600, 16880))
CodePudding user response:
I have no data, so I can not properly test this, but the following should work. Note the or operator (|) to specify two different conditions.
library(data.table)
dt <- as.data.table(Data2)
dt[(postnr>10000&postnr<13000)|(postnr>15600&postnr<=16880),]