I got my dask dataframe string date column converted to a pandas datetime and it created a datetimeindex. When I try assigning it back to the source dask dataframe using
ddf.assign(date=date_parsed)
line, I get a
ValueError: Length of values (1000000) does not match length of index (2)
.
I initially thought the create datetimeindex have the correct length but the source have only 2 indeces. I tried converting the datetimeindex into a pd.dataframe, which successfully converted but I cannot add that pd.df into the daskdf. I also tried converting it back to a series, but still not able to append/assign.
What I would like to do is to assign the datetimeindex back to the source dask df.
sample dask df converted from pd. all values are string datatype.
df=pd.DataFrame({'fname': ['dwayne','peter','dead','wonder'],
'lname': ['rock','pan','pool','boy'],
'entrydate':['31DEC2021', '22JAN2022', NaN, '15DEC2025']})
ddf = dd.from_pandas(df)
what I did: (1) parsed the entrydate values and converted to datetime. it gave me the following:
DatetimeIndex(['2021-12-31', '2022-01-22', 'NaT', '2025-12-15'], dtype='datetime64[ns]', length=4, freq=None)
(2) I dropped the 'entrydate' column using the drop function.
(3) When I tried the assign function, I get the ValueError...
CodePudding user response:
Hi this value error occoured because NaN value use errors='coerce' attribute to ignore errors. Pandas inbuild datetime conversion mechanism is enough for this issue.
df['entrydate'] = pd.to_datetime(df['entrydate'], errors='coerce').dt.strftime('%Y-%m-%d')
CodePudding user response:
There is no need to create new column using assign
. Dask dataframe
supports pandas
API, so the following works:
import dask.dataframe as dd
import pandas as pd
df=pd.DataFrame({'fname': ['dwayne','peter','dead','wonder'],
'lname': ['rock','pan','pool','boy'],
'entrydate':['31DEC2021', '22JAN2022', NaN, '15DEC2025']})
ddf = dd.from_pandas(df, npartitions=2)
# roughly same as ddf.assign(date=date_parsed)
ddf["date"] = dd.to_datetime(ddf["entrydate"])
See also this answer.