See initial question in the end.
I have a dataframe like so
df = pd.DataFrame({'Persons':[10,20,30], 'Bill':[110,240,365], 'Guests':[12,25,29],'Visitors':[15,23,27]})
df
Persons Bill Guests Visitors
10 110 12 15
20 240 25 23
30 365 29 27
I want a data frame like below
Persons Bill Guests Visitors Charge VisitorsCharge
10 110 12 15 136 175
20 240 25 23 302.5 277.5
30 365 29 27 352.5 327.5
Here Charge
is the interpolated value corresponding to Guests
with columns People
& Bill
as reference.
If we take the first row, we say 10 People
will rack-up as Bill
of 110 & 20 People
will rack-up a Bill
of 240. So, how much is 12 Guests
create a Charge
?
Formula for this is as below
Row1
import scipy.stats as stats
result = stats.linregress([10,20],[110,240])
slope = result.slope #extract the slope of the interpolation curve
intercept = result.intercept #extract the intercept of the interpolation curve
interpolatedValue = slope*12 intercept #interpolate the value
interpolatedValue
Row2
import scipy.stats as stats
result = stats.linregress([20,30],[240,365])
slope = result.slope #extract the slope of the interpolation curve
intercept = result.intercept #extract the intercept of the interpolation curve
interpolatedValue = slope*25 intercept #interpolate the value
interpolatedValue
Row3
import scipy.stats as stats
result = stats.linregress([20,30],[240,365])
slope = result.slope #extract the slope of the interpolation curve
intercept = result.intercept #extract the intercept of the interpolation curve
interpolatedValue = slope*29 intercept #interpolate the value
interpolatedValue
For every row except the last row, we have to use the current & the next row values to get our result.
However, when we reach the last row, we will not have a 'next' row. So, we concatenation current row & previous row values.
We do the same to calculate VisitorsCharge
as well. But here, we use Vistors
column value to multiply with "Slope"
A function would solve the issue. However, with lambda function I do not have access to previous & next rows. With df.apply, I am unable to figure out the index of each row as the function is being applied. How do I do it?
initial question
I have a dataframe like so
A B
1 100
2 200
3 300
I want a data frame like below
A B C
1 100 '1-2-100-200'
2 200 '2-3-200-300'
3 300 '2-3-200-300'
CodePudding user response:
I think this is what you want:
import scipy.stats as stats
def compute(i, n=2):
j = min(i, df.index[len(df)-n])
idx = df.index[j:j n]
result = stats.linregress(df.loc[idx, 'Persons'], df.loc[idx, 'Bill'])
slope = result.slope
intercept = result.intercept
return slope*df.loc[i, 'Guests'] intercept
df['Charge'] = [compute(i) for i in df.index]
# or
# df['Charge'] = df.index.to_series().apply(compute)
output:
Persons Bill Guests Charge
0 10 110 12 136.0
1 20 240 25 302.5
2 30 365 29 352.5
CodePudding user response:
NB. solution to initial question. See here for an answer to the new question.
a = df['A'].astype(str)
b = df['B'].astype(str)
s = a '-' a.shift(-1) '-' b '-' b.shift(-1)
df['C'] = s.ffill()
Generalization for an arbitrary number of columns:
def cat(s, sep='-'):
s = s.astype(str)
return s sep s.shift(-1)
df['C'] = df.apply(cat).ffill().agg('-'.join, axis=1)
output:
A B C
0 1 100 1-2-100-200
1 2 200 2-3-200-300
2 3 300 2-3-200-300
CodePudding user response:
Try this:
import scipy.stats as stats
df['next_persons'] = df.Persons.shift(-1)
df['next_bill'] = df.Bill.shift(-1)
def your_interpolation_func(x, y, z):
result = stats.linregress(np.array(x), np.array(y))
return result.slope*z result.intercept
df['charge'] = df.apply(lambda row: your_interpolation_func(
[row.Persons, row.next_persons],
[row.Bill, row.next_bill],
row.Guests), axis=1)
Output:
df
Persons Bill Guests next_persons next_bill charge
0 10 110 12 20.0 240.0 136.0
1 20 240 25 30.0 365.0 302.5
2 30 365 29 NaN NaN NaN
the NaN
in the last row is because you don't have any next numbers for the last row. You can apply the function to df.iloc[:-1]
to avoid that.