Home > Software engineering >  selecting a df row by month formatted with (lambda x: datetime.datetime.strptime(x, '%Y-%m-%dT%
selecting a df row by month formatted with (lambda x: datetime.datetime.strptime(x, '%Y-%m-%dT%

Time:11-04

I'm having some issues with geopandas and pandas datetime objects; I kept getting the error

pandas Invalid field type <class 'pandas._libs.tslibs.timedeltas.Timedelta'>

when I try to save it using gpd.to_file() apparently this is a known issue between pandas and geopandas date types so I used

df.DATE = df.DATE.apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%dT%H:%M:%S%z'))

to get a datetime object I could manipulate without getting the aforementioned error when I save the results. Due to that change, my selection by month

months = [4]
for month in months:
    df = df[[(pd.DatetimeIndex(df.DATE).month == month)]]

no longer works, throwing a value error.

ValueError: Item wrong length 1 instead of 108700.

I tried dropping the pd.DatetimeIndex but this throws a dataframe series error

AttributeError: 'Series' object has no attribute 'month'

and

df = df[(df.DATE.month == month)]

gives me the same error. I know it converted over to a datetime object because print(df.dtype) shows DATE datetime64[ns, UTC] and

for index, row in df.iterrows():
    print(row.DATE.month)

prints the month as a integer to the terminal.

Without going back to pd.Datetime how can I fix my select statement for the month?

CodePudding user response:

The statement df.DATE returns a Series object. That doesn't have a .month attribute. The dates inside the Series do, which is why row.DATE.month works. Try something like:

filter = [x.month == month for x in df.DATE]
df_filtered = df[filter]

Before that, I'm not sure what you're trying to accomplish with pd.DatetimeIndex(df.DATE).month == month) but a similar fix should take care of it.

  • Related