Home > Blockchain >  Very simple pandas column/row transform that I cannot figure out
Very simple pandas column/row transform that I cannot figure out

Time:09-01

I need to do a simple calculation on values in a dataframe, but I need some column transposed first. Once they are transposed I want to take the most recent amount / 2nd most recent amount and then the binary result if it less than or equal to .5

By most recent I mean most recent to the date in the Date 2 column

Have This

| Name | Amount |  Date 1                | Date 2     |
| -----| ----   |------------------------|------------|
| Jim  | 100    |   2021-06-10           | 2021-06-15 |
| Jim  | 200    |   2021-05-11           | 2021-06-15 |
| Jim  | 150    |   2021-03-5            | 2021-06-15 |
| Bob  | 350    |   2022-06-10           | 2022-08-30 |
| Bob  | 300    |   2022-08-12           | 2022-08-30 |
| Bob  | 400    |   2021-07-6            | 2022-08-30 |

I Want this
| Name | Amount | Date 2| Most Recent Amount(MRA) | 2nd Most Recent Amount(2MRA) | MRA / 2MRA| Less than or equal to .5 |
| -----| -------|------------------------|----------------|--------------------|-------------|--------------------------|
| Jim  | 100    |   2021-06-15           |       100      |       200          | .5          | 1                        |
| Bob  | 300    |   2022-08-30           |       300      |       350          | .85         | 0                        |

CodePudding user response:

One way to do what you've asked is:

df = ( df[df['Date 1'] <= df['Date 2']]
        .groupby('Name', sort=False)['Date 1'].nlargest(2)
        .reset_index(level=0)
        .assign(**{
            'Amount': df.Amount, 
            'Date 2': df['Date 2'],
            'recency': ['MRA','MRA2']*len(set(df.Name.tolist()))
        })
        .pivot(index=['Name','Date 2'], columns='recency', values='Amount')
        .reset_index().rename_axis(columns=None) )
df = df.assign(**{'Amount':df.MRA, 'MRA / MRA2': df.MRA/df.MRA2})
df = df.assign(**{'Less than or equal to .5': (df['MRA / MRA2'] <= 0.5).astype(int)})
df = pd.concat([df[['Name', 'Amount']], df.drop(columns=['Name', 'Amount'])], axis=1)

Input:

  Name  Amount     Date 1     Date 2
0  Jim     100 2021-06-10 2021-06-15
1  Jim     200 2021-05-11 2021-06-15
2  Jim     150 2021-03-05 2021-06-15
3  Bob     350 2022-06-10 2022-08-30
4  Bob     300 2022-08-12 2022-08-30
5  Bob     400 2021-07-06 2022-08-30

Output:

  Name  Amount     Date 2  MRA  MRA2  MRA / MRA2  Less than or equal to .5
0  Bob     300 2022-08-30  300   350    0.857143                         0
1  Jim     100 2021-06-15  100   200    0.500000                         1

Explanation:

  • Filter only for rows where Date 1 <= Date 2
  • Use groupby() and nlargest() to get the 2 most recent Date 1 values per Name
  • Use assign() to add back the Amount and Date 2 columns and create a recency column containing MRA and MRA2 for the pair of rows corresponding to each Name value
  • Use pivot() to turn the recency values MRA and MRA2 into column labels
  • Use reset_index() to restore Name and Date 2 to columns, and use rename_axis() to make the columns index anonymous
  • Use assign() once to restore Amount and add column MRA / MRA2, and again to add column named Less than or equal to .5
  • Use concat(), [] and drop() to rearrange the columns to match the output sequence shown in the question.

CodePudding user response:

This is the original dataframe.

df = pd.DataFrame({'Name':['Jim','Jim','Jim','Bob','Bob','Bob'],
               'Amount':[100,200,150,350,300,400],
               'Date 1':['2021-06-10','2021-05-11','2021-03-05','2022-06-10','2022-08-12','2021-07-06'],
               'Date 2':['2021-06-15','2021-06-15','2021-06-15','2022-08-30','2022-08-30','2022-08-30']
              })

And this is the results.

g = df.sort_values('Date 1', ascending=False).groupby(['Name'])
first = g.agg({'Date 2':'first','Amount':'first'}).rename(columns={'Amount':'MRA'}).reset_index()
second = g.agg({'Date 2':'first','Amount':lambda t: t.iloc[1]}).rename(columns={'Amount':'2MRA'}).reset_index()
df_T = pd.merge(first, second, on=['Name','Date 2'], how='left')

def operator(x):
    return x['MRA']/x['2MRA'], [1 if x['MRA']/x['2MRA']<=.5 else 0][0]

df_T['MRA/2MRA'], df_T['Less than or equal to .5'] = zip(*df_T.apply(operator, axis=1))

Hope this helps. :)

CodePudding user response:

Here's the rough procedure you want:

  1. sort_values by Name and Date 1 to get the data in order.
  2. shift to get the previous date and 2nd most recent amount fields
  3. Filter the dataframe for Date 1 <= Date 2.
  4. group_by by Name and use head to get only the first row.

Now, your Amount column is your Most Recent Amount and your Shifted Amount column is the 2nd Most Recent amount. From there, you can do a simple division to get the ratio.

  • Related