I have dictionaries like this :
tr1 = {'label': 'name1', 'date': '2021-09-29'}
tr2 = {'label': 'name1', 'date': '2021-08-30'}
tr3 = {'label': 'name1', 'date': '2021-09-30'}
tr4 = {'label': 'name2', 'date': '2021-06-30'}
tr5 = {'label': 'name2', 'date': '2021-05-30'}
tr6 = {'label': 'name3', 'date': '2021-06-30'}
And I want to get a list like this :
[1, 1, 1, 30, 0]
This list is the minimum gap between date in days for the dictionary that have the same label or 0 if we don't have other dictionary with the same label. I tried with dataframe, groupby and .transfrom but that doesn't work :
df_day = pd.DataFrame(sample_transaction)
df_day.date = df_day.date.apply(lambda x :
int(datetime.datetime.timestamp(
datetime.datetime.strptime(x, "%Y-%m-%d"))))
group_day = df_day[['label', 'date']].groupby(['label'])
group_day.transform(
lambda x: min([abs(a - b) if a != b else 0.0 for a in x for b in x]))
sample_transaction
is just the list with the dictionaries inside,
I tried to convert the date in second with timestamp and
I tried to calculated with transform and lambda but I just get a list of 0.0
CodePudding user response:
IIUC, you can sort the dates per group and get the min diff
:
l = [tr1, tr2, tr3, tr4, tr5, tr6]
(pd.DataFrame(l)
.assign(date=lambda d: pd.to_datetime(d['date']))
.groupby('label')['date']
.transform(lambda s: s.sort_values().diff().min())
)
Output:
0 1 days 00:00:00
1 1 days 00:00:00
2 1 days 00:00:00
3 31 days 00:00:00
4 31 days 00:00:00
5 NaT
Name: date, dtype: object
For the exact provided format:
(pd.DataFrame(l)
.assign(date=lambda d: pd.to_datetime(d['date']))
.groupby('label')['date']
.transform(lambda s: s.sort_values().diff().min().days)
.fillna(0, downcast='infer')
.to_list()
)
Output:
[1, 1, 1, 31, 31, 0]