I have 4 columns in dataset which are cid(customer level), month, spending and transaction (max.cid=10000). As seen below, df.head().
cid month spending transaction
0 1 3 61.94 28
1 1 4 73.02 23
2 1 7 59.34 25
3 1 8 48.69 24
4 1 9 121.79 26
I use the following function to calculate the trend (slope)in the outflow spending per customer. However, I get the identical number as a result for the whole dataset. Expected to calculate trend of spendings on customer level. (trend value for each customer).
Is there a way to iterate over each customer level in the dataset and obtain individual trends per customer? Thanks in advance!
df = pd.read_csv("/content/case_data.csv")
import numpy as np
def trendline(df, order=1):
coeffs = np.polyfit(df.index.values, list(df), order)
slope = coeffs[-2]
return float(slope)
outflow = df['spending']
cid = df['cid']
df_ = pd.DataFrame({'cid': cid, 'outflow': outflow})
slope_outflow = trendline(df_['cid'])
slope_outflow
Output : 0.13377820413729283
Expected Output: (Trend1), (Trend2), (Trend3), ......, (Trend10000)
CodePudding user response:
def trendline(x, y, order=1):
return np.polyfit(x, y, order)[-2]
df.groupby('cid').apply(lambda subdf: trendline(subdf['month'].values, subdf['spending'].values))
You can use groupby
to calculate the trend by each cid
value. In the above example it is for the trend of spending.