I am now doing a Python programming task on a data set of movie industry, from the Kaggle website dataset https://www.kaggle.com/datasets/danielgrijalvas/movies .
I imported the dataset into a Pandas dataframe df. There is a column "released", of type "str", i.e. string, with character string date data format as the followings:
>>> df.released[:10]
index | Date Strings look like below |
---|---|
0 | June 13, 1980 (United States) |
1 | July 2, 1980 (United States) |
2 | June 20, 1980 (United States) |
3 | July 2, 1980 (United States) |
4 | July 25, 1980 (United States) |
5 | May 9, 1980 (United States) |
6 | June 20, 1980 (United States) |
7 | December 19, 1980 (United States) |
8 | June 19, 1981 (United States) |
9 | May 16, 1980 (United States) |
Name: released, dtype: objects
>>> type(df.released[0])
str
I would like to extract/convert the date information inside this "released" column of such character string format, discarding the country name with the brackets, into another new dataframe column, of data type either as the pandas.Timestamp format, or the Python datetime format.
I searched the Internet a lot and cannot find a good solution of Python codes/functions to do such extraction/conversion.
Could anyone help?
Best Regards Alex Chu
CodePudding user response:
You will want to split on the (
so that the date and the country are separated, then call to_datetime
on the dates.
pd.to_datetime(df['released'].str.split(' \(').str[0])
CodePudding user response:
Using pd.to_datetime() after splitting the column accordingly should work. Kindly try:
pd.to_datetime(df['released'].str.split(r'(')).str[0],infer_datetime_format=True)
As an example:
pd.to_datetime('June 13, 1980',infer_datetime_format=True)
Returns:
Timestamp('1980-06-13 00:00:00')