Home > Blockchain >  Dataframe column sum by id value, returning one value per id
Dataframe column sum by id value, returning one value per id

Time:03-02

I have a pandas dataframe containing numerous rows of data per id value like this.

account_id    final_premium    days_since_first_sale   ...
 0003eDas         42.50                1               
 0003eDas         24.80                5
 0003eDas         80.80                7
 0215fweD         34.25                4
 4sdfsw34         21.00                5
 4sdfsw34         10.00                8

I have added a lifetime value column on which is a grouped by sum of each account_ids final_premium but it prints the LTV value in each of the id rows like this

account_id    final_premium    days_since_first_sale   LTV
 0003eDas         42.50                1              148.10
 0003eDas         24.80                5              148.10
 0003eDas         80.80                7              148.10
 0215fweD         34.25                4               34.25
 4sdfsw34         21.00                5               31.00
 4sdfsw34         10.00                8               31.00

What I'm looking for is this format:

 account_id    final_premium    days_since_first_sale   LTV
 0003eDas         42.50                1              
 0003eDas         24.80                5              
 0003eDas         80.80                7              148.10
 0215fweD         34.25                4               34.25
 4sdfsw34         21.00                5               
 4sdfsw34         10.00                8               31.00

Where the LTV value only appears next to the final entry for that id.

So far I've tried using the pandas .reindex() and .merge() function to no avail. Is there another method I could try?

I only want one value for LTV per id as there are numerous other columns of data that I want to analyse against this value. I've found that having the same LTV value for each ID row has lead to inaccuracy in my work so I'd like only one value per ID.

Thank you for any help.

CodePudding user response:

You can use the following. The code computes the groups by account_id. Then you can compute the sum for each group and store it in the last value of each group:

groups = df.groupby('account_id')
last_index = pd.DataFrame.last_valid_index
df.loc[groups.apply(last_index), 'LTV'] = groups['final_premium'].sum().values

Output:

    account_id  final_premium   days_since_first_sale   LTV
0   0003eDas    42.50   1   NaN
1   0003eDas    24.80   5   NaN
2   0003eDas    80.80   7   148.10
3   0215fweD    34.25   4   34.25
4   4sdfsw34    21.00   5   NaN
5   4sdfsw34    10.00   8   31.00
  • Related