I am trying to figure out how to show the incidents happening only in 2016. The format is a chr format and looks like "2016-12-31". Is there a way to search for values only from 2016?
Current code:
most_fatalities_2016 <- gun_violence[which(gun_violence$date == "2016"), select = c("state", "city_or_county")]
I guess I'm looking for the r function that acts like the LIKE function in SQL. Any help?
CodePudding user response:
You can simply use substr()
or grepl()
gun_violence[substr(gun_violence$date,1,4)=="2016",]
or
gun_violence[grepl("^2016-",gun_violence$date),]
The above returns all column of the gun_violence
data.frame. If you want to only return specific columns, you can specify those columns like this:
gun_violence[grepl("^2016-",gun_violence$date),c("state", "city_or_county")]
CodePudding user response:
I may be going further than what is asked, but I want to give some advice regarding the way the data is being stored and manipulated.
It may be much easier in downstream analyses if we transform this character variable into a proper date format beforehand.
Advice #2:
The dplyr
package provides very clear synthax for manipulation of dataframes, which may be a nice introduction comming from SQL-based backgrounds.
Advice #3:
Understanding and "reverse-engineering" the dbplyr
package (https://dbplyr.tidyverse.org/) may be enlightening for SQL-experienced users
gun_violence$date <-readr::parse_date(gun_violence$date)
after that, we can use many date-related functions, such as:
library(dplyr)
library(lubridate)
gun_violence %>% filter(date < today())
###
gun_violence %>% filter(year(date) == 2016) ### for the desired operation in the question
###
and so on