I have a pandas dataframe containing numerous rows of data per id value like this.
account_id final_premium days_since_first_sale ...
0003eDas 42.50 1
0003eDas 24.80 5
0003eDas 80.80 7
0215fweD 34.25 4
4sdfsw34 21.00 5
4sdfsw34 10.00 8
I have added a lifetime value column on which is a grouped by sum of each account_ids final_premium but it prints the LTV value in each of the id rows like this
account_id final_premium days_since_first_sale LTV
0003eDas 42.50 1 148.10
0003eDas 24.80 5 148.10
0003eDas 80.80 7 148.10
0215fweD 34.25 4 34.25
4sdfsw34 21.00 5 31.00
4sdfsw34 10.00 8 31.00
What I'm looking for is this format:
account_id final_premium days_since_first_sale LTV
0003eDas 42.50 1
0003eDas 24.80 5
0003eDas 80.80 7 148.10
0215fweD 34.25 4 34.25
4sdfsw34 21.00 5
4sdfsw34 10.00 8 31.00
Where the LTV value only appears next to the final entry for that id.
So far I've tried using the pandas .reindex() and .merge() function to no avail. Is there another method I could try?
I only want one value for LTV per id as there are numerous other columns of data that I want to analyse against this value. I've found that having the same LTV value for each ID row has lead to inaccuracy in my work so I'd like only one value per ID.
Thank you for any help.
CodePudding user response:
You can use the following. The code computes the groups by account_id
. Then you can compute the sum for each group and store it in the last value of each group:
groups = df.groupby('account_id')
last_index = pd.DataFrame.last_valid_index
df.loc[groups.apply(last_index), 'LTV'] = groups['final_premium'].sum().values
Output:
account_id final_premium days_since_first_sale LTV
0 0003eDas 42.50 1 NaN
1 0003eDas 24.80 5 NaN
2 0003eDas 80.80 7 148.10
3 0215fweD 34.25 4 34.25
4 4sdfsw34 21.00 5 NaN
5 4sdfsw34 10.00 8 31.00