Home > Enterprise >  Compare two columns in pandas to update date column in a dataframe
Compare two columns in pandas to update date column in a dataframe

Time:07-02

I need to compare a certain range of value of a column with a specific string in another column to update the date column with 1,7, 30 days

Col A    ColB           colc
5          Internal     1-1-2022
7          external     1-1-2022
4          external     1-1-2022
3          external     1-1-2022

so if df[ColA]<8 and df[ColB]=External add 7 days to df[colc]. if if df[ColA]>=8 and df[ColB]=External add 30 days to df[colc] and if df[ColB]=Internal add 120 days to df[Colc]

CodePudding user response:

You can use numpy.select to handle all your conditions:

I simplified them to:

  • if ColB is "Internal" -> add 120 days
  • else, if ColA is ≤ 8 -> add 7 days
  • else -> add 30 days
factor = np.select([df['ColB'].eq('Internal'), df['Col A'].le(8)], [120, 7], 30)
df['new'] = (pd.to_datetime(df['colc'], dayfirst=True)
               .add(pd.DateOffset(days=1)*a)
               .dt.strftime('%-d-%-m-%-Y')
             )

output:

   Col A      ColB      colc       new
0      5  Internal  1-1-2022  3-1-2022
1      7  external  1-1-2022  6-1-2022
2      4  external  1-1-2022  1-1-2022
3      3  external  1-1-2022  3-1-2022

CodePudding user response:

Using the same np.select as @mozway, I would suggest using pd.Timedelta.

df = pd.DataFrame(
    {
        'Col A': [5, 7, 4, 3], 
        'ColB': ['Internal', 'external', 'external', 'external'], 
        'colc': pd.to_datetime("1/1/2022", format="%m/%d/%Y")
    }
)

df['colc'] = [
    (a   pd.Timedelta(b, 'd')) for (a, b) in
    zip(df.colc, np.select([df['ColB'].eq('Internal'), df['Col A'].le(8)], [120, 7], 30))
]

print(df)

Result:

   Col A      ColB       colc
0      5  Internal 2022-05-01
1      7  external 2022-01-08
2      4  external 2022-01-08
3      3  external 2022-01-08
  • Related