i have two dataframe df1 and df2
df1=
id start end
a 1/12/2022 18/12/2022
a 19/12/2022 25/12/2022
a 26/12/2022 31/12/2022
b 01/12/2022 20/12/2022
b 21/12/2022 31/12/2022
c 01/12/2022 31/12/2022
d 01/12/2022 15/12/2022
d 16/12/2022 31/12/2022
and second data frame as
df2
id start_2 end_2 number
a 15/12/2022 15/12/2022 1
b 17/12/2022 19/12/2022 3
b 25/12/2022 27/12/2022 2
c 12/12/2022 12/12/2022 1
d 03/12/2022 04/12/2022 2
d 25/12/2022 25/12/2022 1
I want to merge both dataframe left join (df1 and df2) by id . And adjust column 'number' in same date range(start and end date) coming in df1 . Like if in df2 id 'a' has number 1 it should come in first row of 'a'(1/12/2022 to 18/12/2022) not in other slot. Other slot should be zero. Like below
Resulted df
id start end number
a 1/12/2022 18/12/2022 1
a 19/12/2022 25/12/2022 0
a 26/12/2022 31/12/2022 0
b 01/12/2022 20/12/2022 3
b 21/12/2022 31/12/2022 2
c 01/12/2022 31/12/2022 1
d 01/12/2022 15/12/2022 2
d 16/12/2022 31/12/2022 1
Note if two number lies in same slot of df1,there should be groupby sum.
CodePudding user response:
This is a work around. After merging you set start
and end
condition then make a good use of .loc
and groupby
df1["start"] = pd.to_datetime(df1["start"], dayfirst=True)
df1["end"] = pd.to_datetime(df1["end"], dayfirst=True)
df2["start_2"] = pd.to_datetime(df2["start_2"], dayfirst=True)
df2["end_2"] = pd.to_datetime(df2["end_2"], dayfirst=True)
merged_df = pd.merge(df1, df2, on="id", how="left")
merged_df["number_adj"] = 0
start_condition = (merged_df["start_2"] >= merged_df["start"]) & (merged_df["start_2"] <= merged_df["end"])
end_condition = (merged_df["end_2"] >= merged_df["start"]) & (merged_df["end_2"] <= merged_df["end"])
merged_df.loc[start_condition | end_condition, "number_adj"] = merged_df["number"]
merged_df = merged_df.groupby(["id", "start", "end"]).sum().reset_index()
merged_df.drop("number", axis=1, inplace=True)
merged_df.rename(columns={"number_adj": "number"}, inplace=True)
print(merged_df)
Output:
id start end number
0 a 2022-12-01 2022-12-18 1
1 a 2022-12-19 2022-12-25 0
2 a 2022-12-26 2022-12-31 0
3 b 2022-12-01 2022-12-20 3
4 b 2022-12-21 2022-12-31 2
5 c 2022-12-01 2022-12-31 1
6 d 2022-12-01 2022-12-15 2
7 d 2022-12-16 2022-12-31 1
CodePudding user response:
You can use concat and the groupby with the size() method.
df = pd.concat([df1, df2])
df.groupby(["start", "end"]).size()
CodePudding user response:
You can merge on id
then filter out your list:
# Convert to DatetimeIndex if necessary
df1['start'] = pd.to_datetime(df1['start'], dayfirst=True)
df1['end'] = pd.to_datetime(df1['end'], dayfirst=True)
df2['start_2'] = pd.to_datetime(df2['start_2'], dayfirst=True)
df2['end_2'] = pd.to_datetime(df2['end_2'], dayfirst=True)
# Merge on id, reset_index to preserve original index on merge
out = df1.reset_index().merge(df2, on='id', how='left')
# Check intervals
out['indicator'] = (out['start'] < out['start_2']) & (out['end_2'] < out['end'])
# Filter the list and set to 0 other slots
out = out.loc[out.groupby('index')['indicator'].idxmax()]
out.loc[~out['indicator'], 'number'] = 0
# Get the final dataframe
out = out[df1.columns.tolist() ['number']].set_index(df1.index)
Output:
>>> out
id start end number
0 a 2022-12-01 2022-12-18 1
1 a 2022-12-19 2022-12-25 0
2 a 2022-12-26 2022-12-31 0
3 b 2022-12-01 2022-12-20 3
4 b 2022-12-21 2022-12-31 2
5 c 2022-12-01 2022-12-31 1
6 d 2022-12-01 2022-12-15 0
7 d 2022-12-16 2022-12-31 1