Tuple indices must be integers or slices, not str in pandas dataframe-CodePudding

I am working with pandas dataframe.

The dataframe (nammed data_table) is:

    rec               age  income student credit_rating buys_computer
0    r1       less_thirty    high      no          fair            no
1    r2       less_thirty    high      no     excellent            no
2    r3  thirtyone_fourty    high      no          fair           yes
3    r4    greater_fourty  medium      no          fair           yes
4    r5    greater_fourty     low     yes          fair           yes
5    r6    greater_fourty     low     yes     excellent            no
6    r7  thirtyone_fourty     low     yes     excellent           yes
7    r8       less_thirty  medium      no          fair            no
8    r9       less_thirty     low     yes          fair           yes
9   r10    greater_fourty  medium     yes          fair           yes
10  r11       less_thirty  medium     yes     excellent           yes
11  r12  thirtyone_fourty  medium      no     excellent           yes
12  r13  thirtyone_fourty    high      no          fair           yes
13  r14    greater_fourty  medium      no     excellent            no

Now I need to determine the probability of picking buys_computer='yes' with age='less_thirty'.

My code:

data_table = pd.read_csv('DT.csv')
total_elements = len(data_table.index)
count = 0
for row in data_table.iterrows():
    if row['age'] == 'less_thirty' and row['buys_computer'] == 'yes':
        count  = 1
    probability = count/total_elements
    print(probability)

But it gives error like:

---> 42         if row['age'] == 'less_thirty' and row['buys_computer'] == 'yes':

TypeError: tuple indices must be integers or slices, not str

Please suggest how can I fix this?

CodePudding user response：

iterrows yields a tuple of index and row, you would need to do:

count = 0
for _, row in data_table.iterrows():
    if row['age'] == 'less_thirty' and row['buys_computer'] == 'yes':
        count  = 1
probability = count/len(data_table)
print(probability)

But it's more efficient to go vectorial:

count = (data_table['age'].eq('less_thirty')
       & data_table['buys_computer'].eq('yes')).sum()
print(count/len(data_table))

output: 0.14285714285714285

CodePudding user response：

Because iterrows returns a tuple, so you cannot directly use a conditional statement without unpacking. That said, why loop when you can make the comparison directly using:

condition = df[(df['age'] == 'less_thirsty') & (df['buys_computer'] == 'yes')]
print(len(condition)/ len(df))

in which df = your original dataframe.

CodePudding user response：

You can calculate probability and print it outside the for loop.