I have a dataframe containing a column that has a list of dates stored as strings:
# sample dataframe
data = [[1, ["2019-08-02 08:30:56"]], [2, ["2020-08-02 08:30:56"]]]
df = pd.DataFrame(data, columns=["items", "dates"])
df["dates"] = df["dates"].astype(str)
df
items dates
0 1 ['2019-08-02 08:30:56']
1 2 ['2020-08-02 08:30:56']
I would like to do several things:
- Convert from a list to a string.
- Convert the string to a date.
- Eliminate the time stamp.
So that the final dataset would look like this:
items dates
0 1 2019-08-02
1 2 2020-08-02
I am able to remove the list brackets by doing this:
df["dates_2"] = df["dates"].apply(lambda x: x[1:-1])
But I am wondering if there is a better way to do all of these things in one step?
CodePudding user response:
Another solution is to apply ast.literal_eval
to convert the column to list
. Then .explode()
it, convert to datetime and get a date part:
from ast import literal_eval
df["dates"] = df["dates"].apply(literal_eval)
df = df.explode("dates")
df["dates"] = pd.to_datetime(df["dates"]).dt.date
print(df)
Prints:
items dates
0 1 2019-08-02
1 2 2020-08-02
CodePudding user response:
Example
data = [[1, ["2019-08-02 08:30:56"]], [2, ["2020-08-02 08:30:56"]]]
df = pd.DataFrame(data, columns=["items", "dates"])
df["dates"] = df["dates"].astype(str)
Code
pd.to_datetime(df['dates'].str[2:-2]).dt.normalize()
output
0 2019-08-02
1 2020-08-02
Name: dates, dtype: datetime64[ns] <-- chk dtype
if you want dtype is object
not datetime, use following code:
df['dates'].str[2:12]
0 2019-08-02
1 2020-08-02
Name: dates, dtype: object <-- chk dtype
CodePudding user response:
You can either use pandas.Series.strip
with pandas.Series.str
:
df["dates"] = df["dates"].astype(str).str.strip("['']").str[:10]
Or use pandas.to_datetime
with pandas.Series.dt.strftime
:
df["dates"] = pd.to_datetime(df["dates"].astype(str).str.strip("['']")).dt.strftime("%Y-%m-%d")
# Output
print(df)
items dates
0 1 2019-08-02
1 2 2020-08-02
CodePudding user response:
This is how I would do it:
df.dates = pd.to_datetime(df.dates.str.slice(1,-1)).dt.date
Output:
items dates
0 1 2019-08-02
1 2 2020-08-02
It's important to note that dates
contains values of type str, each of which looks like a list when displayed but is not a list (each is a str) after running the following code as shown in the original question:
# sample dataframe
data = [[1, ["2019-08-02 08:30:56"]], [2, ["2020-08-02 08:30:56"]]]
df = pd.DataFrame(data, columns=["items", "dates"])
df["dates"] = df["dates"].astype(str)
Therefore, all we need to do is:
- remove the
[
and]
characters usingdf.dates.str.slice(1,-1)
to get the substring of type str that looks like a datetime - use
pd.to_datetime()
to convert the Series to Timestamps - use
.dt
to get at the datetime-like properties of the values in the Series - use the
.date
accessor to convert to a numpy array of objects of type datetime.date