Home > Net >  ID names should have the same number of characters in it. How do I filter for data without the appro
ID names should have the same number of characters in it. How do I filter for data without the appro

Time:11-07

I have a dataset where id names are all suppose to have 16 characters in it. How do I filter out all of the data that does not have exactly 16 characters so I can delete it from my dataset. I am working in R Studio.

I've tried both of these in attempt to get r to retrieve data that did not have exactly 16 characters in it but it did not work. I'm new to R so I'm still figuring it out.

length(all_trips$ride_id != 16)
length(nchar(all_trips$ride_id !=16))

CodePudding user response:

You are getting closer and you are on the right track with nchar().

I assume you have a data frame all_trips with a character column ride_id.

Your first attempt:

length(all_trips$ride_id != 16)

translates as "find all the values of ride_id that are not equal to 16, then find the length of the vector containing those values". This probably returns a single number - not what we want.

Your second attempt:

length(nchar(all_trips$ride_id !=16))

translates as "find all the values of ride_id that are not equal to 16, then count the characters in those values, then find the length of the vector containing the values". Again - not what we want.

What you want to do is:

"retain only the subset of all_trips where ride_id contains 16 characters"

Which you can do like this:

all_trips_filtered <- all_trips[nchar(all_trips$ride_id) == 16, ]

Or another way using subset, where you can just specify the column name:

all_trips_filtered <- subset(all_trips, nchar(ride_id) == 16)

See ?Extract or ?subset for more help.

  • Related