How to returns a dataframe according to a date column condition-CodePudding

Below is a dataframes for example :

DF = pd.DataFrame(
    {"Date": ["2021-10-01", "2021-10-02", "2021-10-03"], "List": ["A0", "A1", "A2"],
     "Status": ["B0", "B1", "B2"], "Type": ["E", "S", "S"]}
)

List = pd.DataFrame(
    { "List": ["A0", "A1", "A2"],"Type_List": ["E", "S", "S"]}
)

Status = pd.DataFrame(
    { "Status": ["B0", "B1", "B2"],"Type_Status": ["E", "S", "S"]}
)

I have a DF dataframe that I try to merge with List and Status according to conditions :

If DF Date < "2021-10-01" do:

final_df = DF.set_index('Status').join(Status[Status['Type_Status'] == 'E'].set_index('Status'), 
                                         how='left')
final_df = final_df.reset_index().set_index('List').join(List.drop_duplicates().set_index('List'),
                                                          how='left', rsuffix='df2')
final_df = final_df.reset_index()
final_df['Type'] = final_df['Type_Status'].fillna(final_df.Type_List)

If DF Date > "2021-10-01" do:

final_df2 = DF.merge(Status, on="Status")
final_df2 ['Type'] = final_df2['Type_Status']

And finally concatenates final_df & final_df2 to return a final dataframe

How to make a condition according to the date column of my dataframe DF and make my code more correct ?

CodePudding user response：

Given you are working with 2 dataframes, one for the dates before 2021-10-01 and another one for the dates following it, I suggest using pd.to_datetime() and then slicing your dataframe instead of using an if/else.

DF['Date'] = pd.to_datetime(DF['Date'],infer_datetime_format=True)

final_df = DF[DF['Date'] <' 2021-10-01'].set_index('Status').join(Status[Status['Type_Status'] == 'E'].set_index('Status'), 
                                         how='left')
final_df = final_df.reset_index().set_index('List').join(List.drop_duplicates().set_index('List'),how='left', rsuffix='df2').reset_index()
final_df['Type'] = final_df['Type_Status'].fillna(final_df.Type_List)

And as for the the second dataframe:

final_df2 = DF[DF['Date'] >= '2021-10-01'].merge(Status,on="Status")
final_df2['Type'] = final_df2['Type_Status']

CodePudding user response：

You can use map as you work on only one column. Create a mask for the Date and map the columns Status and List from DF with the corresponding dataframe. Then use np.where depending on your conditions

# new input with all possible cases and change Type values to see the difference
DF = pd.DataFrame( # 3 dates less than the threshold, one meeting the Stauts==E criteria
    {"Date": ["2021-10-01","2021-10-01","2021-10-01", "2021-10-02", "2021-10-03"],
     "List": ["A0", "A1", "A2","A1", "A2"], "Status": ["B0", "B1", "B2","B1", "B2"], 
     "Type": ["B"]*5}
)
print(DF)
#          Date List Status Type
# 0  2021-10-01   A0     B0    B this one will meet the Status==E criteria
# 1  2021-10-01   A1     B1    B this one will be List['A1']
# 2  2021-10-01   A2     B2    B ...
# 3  2021-10-02   A1     B1    B this one with Date greater, then Status mapping
# 4  2021-10-03   A2     B2    B

# Note I changed all type values to see where the result comes from
List = pd.DataFrame( 
    { "List": ["A0", "A1", "A2"],"Type_List": ["I", "A", "S"]}
)
Status = pd.DataFrame(
    { "Status": ["B0", "B1", "B2"],"Type_Status": ["E", "J", "K"]}
)

Here is the code to get the new Types

# mask and mapping List and Status
maskDate = pd.to_datetime(DF['Date']) <= '2021-10-01'
mapStatus = DF['Status'].map(Status.set_index('Status')['Type_Status'])
mapList = DF['List'].map(List.drop_duplicates().set_index('List')['Type_List'])

DF['newType'] = np.where( # you can reassign to Type directly
    maskDate, #condition on the Date
    # when the Date is less equal 2021-10-01
    # then use mapStatus and keep only the Status==E
    # fill the other with the mapping from List
    mapStatus.where(mapStatus.eq('E'), other=mapList),
    # if date greater then 2021-10-01
    mapStatus
)
print(DF)
#          Date List Status Type newType
# 0  2021-10-01   A0     B0    B       E
# 1  2021-10-01   A1     B1    B       A
# 2  2021-10-01   A2     B2    B       S
# 3  2021-10-02   A1     B1    B       J
# 4  2021-10-03   A2     B2    B       K

Ultimately, you can do even more simple, same resut

DF['newTypeV2'] = mapStatus.mask(maskDate & mapStatus.ne('E'), other=mapList)