I have a dataset where id names are all suppose to have 16 characters in it. How do I filter out all of the data that does not have exactly 16 characters so I can delete it from my dataset. I am working in R Studio.
I've tried both of these in attempt to get r to retrieve data that did not have exactly 16 characters in it but it did not work. I'm new to R so I'm still figuring it out.
length(all_trips$ride_id != 16)
length(nchar(all_trips$ride_id !=16))
CodePudding user response:
You are getting closer and you are on the right track with nchar()
.
I assume you have a data frame all_trips
with a character column ride_id
.
Your first attempt:
length(all_trips$ride_id != 16)
translates as "find all the values of ride_id that are not equal to 16, then find the length of the vector containing those values". This probably returns a single number - not what we want.
Your second attempt:
length(nchar(all_trips$ride_id !=16))
translates as "find all the values of ride_id that are not equal to 16, then count the characters in those values, then find the length of the vector containing the values". Again - not what we want.
What you want to do is:
"retain only the subset of all_trips where ride_id contains 16 characters"
Which you can do like this:
all_trips_filtered <- all_trips[nchar(all_trips$ride_id) == 16, ]
Or another way using subset
, where you can just specify the column name:
all_trips_filtered <- subset(all_trips, nchar(ride_id) == 16)
See ?Extract
or ?subset
for more help.