When I run this:
import pandas as pd
data = {'id': ['earn', 'earn','lose', 'earn'],
'game': ['darts', 'balloons', 'balloons', 'darts']
}
df = pd.DataFrame(data)
print(df)
print(df.loc[[1],['id']] == 'earn')
The output is:
id game
0 earn darts
1 earn balloons
2 lose balloons
3 earn darts
id
1 True
But when I try to run this loop:
for i in range(len(df)):
if (df.loc[[i],['id']] == 'earn'):
print('yes')
else:
print('no')
I get the error 'ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().' I am not sure what the problem is. Any help or advice is appreciated -- I am just starting.
I expected the output to be 'yes' from the loop. But I just got the 'ValueError' message. But, when I run the condition by itself, the output is 'True' so I'm not sure what is wrong.
CodePudding user response:
for i,row in df.iterrows():
if row.id == "earn":
print("yes")
CodePudding user response:
Its complicated. pandas
is geared towards operating on entire groups of data, not individual cells. df.loc
may create a new DataFrame
, a Series
or a single value, depending on how its indexed. And those produce DataFrame
, Series
or scalar results for the ==
comparison.
If the indexers are both lists, you get a new DataFrame
and the compare is also a dataframe
>>> foo = df.loc[[1], ['id']]
>>> type(foo)
<class 'pandas.core.frame.DataFrame'>
>>> foo
id
1 earn
>>> foo == "earn"
id
1 True
If one indexer is scalar, you get a new Series
>>> foo = df.loc[[1], 'id']
>>> type(foo)
<class 'pandas.core.series.Series'>
>>> foo
1 earn
Name: id, dtype: object
>>> foo == 'earn'
1 True
Name: id, dtype: bool
If both indexers are scalar, you get a single cell's value
>>> foo = df.loc[1, 'id']
>>> type(foo)
<class 'str'>
>>> foo
'earn'
>>> foo == 'earn'
True
That last is the one you want. The first two produce containers where True
is ambiguous (you need to decide if any or all values need to be True
).
for i in range(len(df)):
if (df.loc[i,'id'] == 'earn'):
print('yes')
else:
print('no')
Or maybe not. Depending on what you intend to do next, create a series of boolean values for all of the rows at once
>>> earn = df[id'] == 'earn'
>>> earn
0 True
1 True
2 False
3 True
Name: id, dtype: bool
now you can continue to make calculations on the dataframe as a whole.