I am working with pandas dataframe.
The dataframe (nammed data_table
) is:
rec age income student credit_rating buys_computer
0 r1 less_thirty high no fair no
1 r2 less_thirty high no excellent no
2 r3 thirtyone_fourty high no fair yes
3 r4 greater_fourty medium no fair yes
4 r5 greater_fourty low yes fair yes
5 r6 greater_fourty low yes excellent no
6 r7 thirtyone_fourty low yes excellent yes
7 r8 less_thirty medium no fair no
8 r9 less_thirty low yes fair yes
9 r10 greater_fourty medium yes fair yes
10 r11 less_thirty medium yes excellent yes
11 r12 thirtyone_fourty medium no excellent yes
12 r13 thirtyone_fourty high no fair yes
13 r14 greater_fourty medium no excellent no
Now I need to determine the probability of picking buys_computer='yes'
with age='less_thirty'
.
My code
:
data_table = pd.read_csv('DT.csv')
total_elements = len(data_table.index)
count = 0
for row in data_table.iterrows():
if row['age'] == 'less_thirty' and row['buys_computer'] == 'yes':
count = 1
probability = count/total_elements
print(probability)
But it gives error like:
---> 42 if row['age'] == 'less_thirty' and row['buys_computer'] == 'yes':
TypeError: tuple indices must be integers or slices, not str
Please suggest how can I fix this?
CodePudding user response:
iterrows
yields a tuple of index and row, you would need to do:
count = 0
for _, row in data_table.iterrows():
if row['age'] == 'less_thirty' and row['buys_computer'] == 'yes':
count = 1
probability = count/len(data_table)
print(probability)
But it's more efficient to go vectorial:
count = (data_table['age'].eq('less_thirty')
& data_table['buys_computer'].eq('yes')).sum()
print(count/len(data_table))
output: 0.14285714285714285
CodePudding user response:
Because iterrows returns a tuple, so you cannot directly use a conditional statement without unpacking. That said, why loop when you can make the comparison directly using:
condition = df[(df['age'] == 'less_thirsty') & (df['buys_computer'] == 'yes')]
print(len(condition)/ len(df))
in which df = your original dataframe.
CodePudding user response:
You can calculate probability and print it outside the for loop.