Home > front end >  R subset dataframe, skip part of interval
R subset dataframe, skip part of interval

Time:03-09

I'm tryng to subset my total data (including all the other varibales) to an interval of zipcodes EXCLUDING a certain part of that interval. Quite new to R and can't get it to work. (Zipcode = postnr)

I have over 100 000 zipcodes (postnr) and want all values for individs in zipcode 10 000-12 999 and 15 600 - 16 800 in my dataset

Attempt 1

Datan <- subset(Data2, Data2$postnr >= 10000 & Data2$postnr <= 16880) 

Datant <- subset(Datan, Datan$postnr >= 15600 & Datan$postnr < 13000)

Datan returns 31 3000 obs in 26 variabels and Datant returns 0 obs in 26 variabels..

Attempt 2

attach(Data2)

Data5 <- Data2 %>% filter(between(postnr, 10000, 12999) & between(postnr, 15600, 16880))

Data 5 returns 0 obsverations...

I have thousands of values for all my variables inside those intervals. What am I doing wrong?

CodePudding user response:

If you think about and versus or you have gotten it. As it is, you're really close!

Can a number be between 1 and 2 and 3 and 5? Nope. But if I said, can a number be between 1 and 2 or 3 and 5? Yup.

For subset:

Datan <- subset(Data2, postnr >= 10000 & postnr <= 16800 | 
   postnr >= 15600 & postnr < 13000)

Where that verticle pipe: | means 'or'.

For dplyr:

(I assume it's dplyr with filter.) You don't need to attach the data, it will extract the variable names from Data2 if it's in the pipe (which it is).

Data5 <- Data2 %>% filter(between(postnr, 10000, 12999) |
    between(postnr, 15600, 16880))

CodePudding user response:

I have no data, so I can not properly test this, but the following should work. Note the or operator (|) to specify two different conditions.

 library(data.table)
 dt <- as.data.table(Data2)
 dt[(postnr>10000&postnr<13000)|(postnr>15600&postnr<=16880),]
  • Related