For the group where variable 1 is True. I want to sum the column Number in the row where variable 2 is false to the other two rows where variable 2 is True.
Variable 1 Variable 2 Number
0 True True 10
1 True True 20
2 True False 30
3 False False 5
Expected output:
Variable 1 Variable 2 Number
0 True True 40
1 True True 50
2 True False 30
3 False False 5
If it's possible I would like the code to work for multiple groups where variable 1 is true.
CodePudding user response:
No need for a loop or anything complex. Use simple boolean indexing for in place modification:
m1 = df['Variable 1']
m2 = df['Variable 2']
# add to the rows where m1 AND m2
# the sum of rows where m1 AND NOT m2
df.loc[m1&m2, 'Number'] = df.loc[m1&~m2, 'Number'].sum()
Output:
Variable 1 Variable 2 Number
0 True True 40
1 True True 50
2 True False 30
3 False False 5
CodePudding user response:
Hope this what you are expecting, if you run the below code you will get the expected output.
import pandas as pd
data={'Variable 1':[True,True,True,False],'Variable 2':[True,True,False,False], 'Number': [10,20,30,5]}
stack = []
for i in range(len(data['Variable 1'])):
if data['Variable 1'][i] and data['Variable 2'][i]:
stack.append((i, data['Number'][i]))
elif data['Variable 1'][i] and not data['Variable 2'][i]:
while len(stack) > 0:
a,b = stack.pop()
data['Number'][a] = data['Number'][a] data['Number'][i]
df=pd.DataFrame(data)
print(df)
Please feel free to comment if this is not your case. Thanks and hope this helps.
CodePudding user response:
It was a little bit hard to understand what you want...
my solution is very explicity...
I Tried to guide you step by step...
My code
import pandas as pd
# your sample data as a dict
data={'Variable 1':[True,True,True,False],'Variable 2':[True,True,False,False], 'Number': [10,20,30,5]}
# create a dataframe from created dict
df = pd.DataFrame.from_dict(data)
# replace empty spaces from name columns so you can use DOT NOTATION
df.columns = df.columns.str.replace(' ', '_').str.lower()
# What to do in case at least one is false
df['both_false'] = 0 df.number[(df.variable_1 == False) | (df.variable_2 == False)]
df['both_false'] = df['both_false'].fillna(0)
# What to do in case Both True
df['both_true'] = 0 df.number[(df.variable_1 == True) & (df.variable_2 == True)].sum()
# Calc 1
df['result_1'] = df.loc[(df.variable_1 == True) & (df.variable_2 == True), 'number'] df.loc[(df.variable_1 == True) & (df.variable_2 == True), 'both_true']
df['result_1'] = df['result_1'].fillna(0)
# Calc 2
df['result_2'] = df.loc[(df.variable_1 == False) | (df.variable_2 == False), 'number']
df['result_2'] = df['result_2'].fillna(0)
# Final result in both cases
df['result'] = df.result_1 df.result_2
# just drop unecessary series
df.drop(['both_false', 'both_true', 'result_1', 'result_2'], axis=1, inplace=True)
Output
- | variable_1 | variable_2 | number | result |
---|---|---|---|---|
0 | True | True | 10 | 40.0 |
1 | True | True | 20 | 50.0 |
2 | True | False | 30 | 30.0 |
3 | False | False | 5 | 5.0 |
Hope that helps you