This is the data given:
reviews = ["2017-09-02T07:00:09Z It's really smooth but the taste isn't so good. Terrible absolutely terrible. More on the cough syrup side than black cherry and vanilla. It was a waste of money. The green apple and blood orange are the best ones. Slightly disappointed in the taste. "]
reviews = pd.DataFrame(reviews)
I need to give an regex expression for the date and separately the time.
This is my attempt:
pattern=r'(\d{4}[-/]\d{2}[-/]\d{2})'
sol=re.findall(pattern,reviews)
print(sol)
CodePudding user response:
Easy way would be to slice the string and convert it to datetime. Next extract the time.
from datetime import datetime
pattern = pattern[:10] ' ' pattern[11:19]
pattern = datetime.fromisoformat(pattern)
print(pattern.time())
CodePudding user response:
If you want to use regular expressions rather than simple string slicing then you could isolate the date and time from this string as follows:
import re
reviews = "2017-09-02T07:00:09Z It's really smooth but the taste isn't so good. Terrible absolutely terrible. More on the cough syrup side than black cherry and vanilla. It was a waste of money. The green apple and blood orange are the best ones. Slightly disappointed in the taste. "
rx = r'(?P<Date>\d{4}\-\d\d\-\d\dT)(?P<Time>\d\d:\d\d:\d\dZ)'
if (m := re.search(rx, reviews[0])):
print(m.group('Date')[:-1])
print(m.group('Time')[:-1])
This regular expression will not cope with multiple occurrences of this date/time pattern