Home > Mobile >  Extracting data from one column with condition on other column - pandas
Extracting data from one column with condition on other column - pandas

Time:11-17

I have a data as below

Cust_ID  Age    Open_Balance
1          3    $50,000.00
1        -15    $51,000.00
1         59    $48,000.00
2         -1    $150,216.00
2         -1    $247,523.00

3 New columns need to be created in the same data frame with the help of data in [Age] and [Open_Balance] as below.

**df['0 to 30']=(df['Open Balance'] if (df['Age ']>0 and df['Age ']<=30) else 0)**
**df['30 to 60']=(df['Open Balance'] if (df['Age ']>30 and df['Age ']<=60) else 0)**
**df['Negative_amount']=(df['Open Balance'] if (df['Age ']<0) else 0)**

Result:

Cust_ID     Age Open Balance    0 to 30     30 to 60        Negative amount
1            3  $50,000.00     $50,000.00    0              0
1           -15 $51,000.00     0             0              $51,000.00
1           59  $48,000.00     0             $48,000.00     0
2           -1  $150,216.00    0             0              $150,216.00
2           -1  $247,523.00    0             0              $247,523.00

I'm new to python and tried a lot to build logic with these conditions but couldn't able to make it. kindly help.

CodePudding user response:

A bunch of numpy.where statements would do here

df['0 to 30'] = np.where( (df['Age']>0) & (df['Age']<=30) , df['Open_Balance'],0)
df['30 to 60'] = np.where( (df['Age']>30) & (df['Age']<=60) , df['Open_Balance'],0)
df['Negative_amount'] = np.where( df['Age']<0, df['Open_Balance'],0)


    Cust_ID Age Open_Balance    0 to 30     30 to 60    Negative_amount
0   1       3   $50,000.00      $50,000.00  0           0
1   1       -15 $51,000.00      0           0           $51,000.00
2   1       59  $48,000.00      0           $48,000.00  0
3   2       -1  $150,216.00     0           0          $150,216.00
4   2       -1  $247,523.00     0           0          $247,523.00
    

CodePudding user response:

A pure Pandas solution would be this:

df['0 to 30'] = df['Open Balance'][df['Age '] > 0 & df['Age '] <= 30]
df['30 to 60'] = df['Open Balance'][df['Age '] > 30 & df['Age '] <= 60]
df['Negative_amount'] = df['Open Balance'][df['Age '] < 0]

This would leave your dataframe with a bunch of NaN values. You can get rid of these by running df.fillna() after those lines.

Note that pandas uses a slightly different syntax for conditions than pure Python code. Where we use and and orin a regular python if-statement, these are replaced by & and | respectively in Pandas and Numpy.

You can have a closer look at Pandas indexing here: Pandas indexing documentation

  • Related