Below is a dataframes for example :
DF = pd.DataFrame(
{"Date": ["2021-10-01", "2021-10-02", "2021-10-03"], "List": ["A0", "A1", "A2"],
"Status": ["B0", "B1", "B2"], "Type": ["E", "S", "S"]}
)
List = pd.DataFrame(
{ "List": ["A0", "A1", "A2"],"Type_List": ["E", "S", "S"]}
)
Status = pd.DataFrame(
{ "Status": ["B0", "B1", "B2"],"Type_Status": ["E", "S", "S"]}
)
I have a DF dataframe that I try to merge with List and Status according to conditions :
If DF Date < "2021-10-01" do:
final_df = DF.set_index('Status').join(Status[Status['Type_Status'] == 'E'].set_index('Status'),
how='left')
final_df = final_df.reset_index().set_index('List').join(List.drop_duplicates().set_index('List'),
how='left', rsuffix='df2')
final_df = final_df.reset_index()
final_df['Type'] = final_df['Type_Status'].fillna(final_df.Type_List)
If DF Date > "2021-10-01" do:
final_df2 = DF.merge(Status, on="Status")
final_df2 ['Type'] = final_df2['Type_Status']
And finally concatenates final_df & final_df2 to return a final dataframe
How to make a condition according to the date column of my dataframe DF and make my code more correct ?
CodePudding user response:
Given you are working with 2 dataframes, one for the dates before 2021-10-01 and another one for the dates following it, I suggest using pd.to_datetime()
and then slicing your dataframe instead of using an if/else.
DF['Date'] = pd.to_datetime(DF['Date'],infer_datetime_format=True)
final_df = DF[DF['Date'] <' 2021-10-01'].set_index('Status').join(Status[Status['Type_Status'] == 'E'].set_index('Status'),
how='left')
final_df = final_df.reset_index().set_index('List').join(List.drop_duplicates().set_index('List'),how='left', rsuffix='df2').reset_index()
final_df['Type'] = final_df['Type_Status'].fillna(final_df.Type_List)
And as for the the second dataframe:
final_df2 = DF[DF['Date'] >= '2021-10-01'].merge(Status,on="Status")
final_df2['Type'] = final_df2['Type_Status']
CodePudding user response:
You can use map
as you work on only one column. Create a mask for the Date and map
the columns Status and List from DF with the corresponding dataframe. Then use np.where depending on your conditions
# new input with all possible cases and change Type values to see the difference
DF = pd.DataFrame( # 3 dates less than the threshold, one meeting the Stauts==E criteria
{"Date": ["2021-10-01","2021-10-01","2021-10-01", "2021-10-02", "2021-10-03"],
"List": ["A0", "A1", "A2","A1", "A2"], "Status": ["B0", "B1", "B2","B1", "B2"],
"Type": ["B"]*5}
)
print(DF)
# Date List Status Type
# 0 2021-10-01 A0 B0 B this one will meet the Status==E criteria
# 1 2021-10-01 A1 B1 B this one will be List['A1']
# 2 2021-10-01 A2 B2 B ...
# 3 2021-10-02 A1 B1 B this one with Date greater, then Status mapping
# 4 2021-10-03 A2 B2 B
# Note I changed all type values to see where the result comes from
List = pd.DataFrame(
{ "List": ["A0", "A1", "A2"],"Type_List": ["I", "A", "S"]}
)
Status = pd.DataFrame(
{ "Status": ["B0", "B1", "B2"],"Type_Status": ["E", "J", "K"]}
)
Here is the code to get the new Types
# mask and mapping List and Status
maskDate = pd.to_datetime(DF['Date']) <= '2021-10-01'
mapStatus = DF['Status'].map(Status.set_index('Status')['Type_Status'])
mapList = DF['List'].map(List.drop_duplicates().set_index('List')['Type_List'])
DF['newType'] = np.where( # you can reassign to Type directly
maskDate, #condition on the Date
# when the Date is less equal 2021-10-01
# then use mapStatus and keep only the Status==E
# fill the other with the mapping from List
mapStatus.where(mapStatus.eq('E'), other=mapList),
# if date greater then 2021-10-01
mapStatus
)
print(DF)
# Date List Status Type newType
# 0 2021-10-01 A0 B0 B E
# 1 2021-10-01 A1 B1 B A
# 2 2021-10-01 A2 B2 B S
# 3 2021-10-02 A1 B1 B J
# 4 2021-10-03 A2 B2 B K
Ultimately, you can do even more simple, same resut
DF['newTypeV2'] = mapStatus.mask(maskDate & mapStatus.ne('E'), other=mapList)