Home > Software engineering >  conditions based on columns values pandas
conditions based on columns values pandas

Time:05-02

I have a table like in the picture below. Based on the values of columns date and date2 I want to add new column date max. The new value should be equal to whichever is bigger (value can be both empty or string and date). Also, I need to add an agent column which, should be equal to the value whose length is greater than 2.

purchase={'date':['11/03/2019','12/05/2021','14/03/2021','11/03/2021'],
      'date2':['11/04/2019','12/03/2021','14/06/2021','X'],
      'price':[300, 400,200, 200],
      'currency':['eur', 'usd','usd','usd'],
      'qty':[200, 300, 400, 500],
      'agent':['AC', 'BC', "CC", 'DC'],
      'agent2':['AC', 'BC', "DDCC", 'DCC']}

df=pd.DataFrame(purchase)

enter image description here

CodePudding user response:

If you just need bigger values in both date, do this:

df["max_date"] = df[["date", "date2"]].apply(max, axis=1)

Assuming your columns contain proper dates and not just strings. for converting to dates first, see Omar's answer.

Bigger length agents names can be gotten same way:

df[["agent", "agent2"]].apply(lambda x: x[0] if len(x[0])>len(x[1]) else x[1], axis=1

CodePudding user response:

First you have to convert the two dates to a datetime type, then compare them, there are many ways to do that, the easiest is using numpy function .where():

df["date"] = pd.to_datetime(df["date"], infer_datetime_format=True)
df["date2"] = pd.to_datetime(df["date2"], infer_datetime_format=True, errors='coerce')


df["max"] = np.where(df["date"]>df["date2"], df["date"], df["date2"])

df["agent_x"] = np.where(df["agent"].str.len()> df["agent2"].str.len(), df["agent"], df["agent2"])

Result:

        date      date2  price currency  qty agent agent2        max agent_x
0 2019-11-03 2019-11-04    300      eur  200    AC     AC 2019-11-04      AC
1 2021-12-05 2021-12-03    400      usd  300    BC     BC 2021-12-05      BC
2 2021-03-14        NaT    200      usd  400    CC   DDCC        NaT    DDCC
3 2021-11-03        NaT    200      usd  500    DC    DCC        NaT     DCC

In case you wanted to fill the missing dates with the available date, add this:

for i, r in df.iterrows():
      if pd.isnull(r["max"]):
            if pd.isnull(r["date"]):
                  df.at[i, "max"] = r["date2"]
            else:
                  df.at[i, "max"] = r["date"]

Result:

        date      date2  price currency  qty agent agent2        max agent_x
0 2019-11-03 2019-11-04    300      eur  200    AC     AC 2019-11-04      AC
1 2021-12-05 2021-12-03    400      usd  300    BC     BC 2021-12-05      BC
2 2021-03-14        NaT    200      usd  400    CC   DDCC 2021-03-14    DDCC
3 2021-11-03        NaT    200      usd  500    DC    DCC 2021-11-03     DCC
  • Related