Indexing With Pandas-CodePudding

I am new to pandas so I am having trouble with the indexing when writing this loop for my assignment:

quality = wine_data_all['quality']
for i in range(1,len(quality.index)): if quality[i] == 6 | quality[i] ==5:
     quality[i] = 1;
wine_data_all.replace['quality',quality]

my intention is to switch all the values that are 6 and 5 in the quality column of wine_data_all with 1 and then swap the new replaced column in for quality. If i can do this without creating a new quality and simply editing the wine_data_all it will also work but I ran into even more problems when trying to index directly out of the data frame.

The error I am getting is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [150], in <cell line: 2>()
      1 quality = wine_data_all['quality']
      2 for i in range(1,len(quality.index)):
----> 5     if quality[i] == 6 | quality[i] ==5:
      7         quality[i] = 0;
     11 print(quality)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\generic.py:1527, in NDFrame.__nonzero__(self)
   1525 @final
   1526 def __nonzero__(self):
-> 1527     raise ValueError(
   1528         f"The truth value of a {type(self).__name__} is ambiguous. "
   1529         "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1530     )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Any help is appreciated.

CodePudding user response：

No need to iterate over the values. Pandas has methods that can do this type of work for you.

Since this is a simple assignment just select and assign value 1.

wine_data_all['quality'][wine_data_all['quality'].isin((5, 6))] = 1

Here's an alternative which would be more suited for a complicated transformation.

wine_data_all['quality'] = wine_data_all['quality'].apply(lambda x: 1 if x in (5, 6) else x)

CodePudding user response：

Since you have not provided any data, here is my test data:

df=pd.DataFrame({'A':[1,2,3,4,np.nan], 'B':[9,8,7,6,5]})

Lets replace A==2| A==3. Since you are already using pandas it is better not to use any loops. The following line of code index all rows where the condition is met and the columns name is A. All these values are then set to 0

df.loc[(df['A']==3) | (df['A']==4), 'A']=0

Gives:

    A       B
0   1.0     9
1   2.0     8
2   0.0     7
3   0.0     6
4   NaN     5

I think you could eliminate your error by setting both conditions into brackets.

CodePudding user response：

For a pandas Dataframe you can iterate through it using the method .iterrows()

So you could use:

for i, element in wine_data_all.iterrows():
     # Your process here, access the quality column as:
     # element["quality"]

If you want to iterate over a single column, i.e. a Pandas Series, you iterate only over its items:

quality = wine_data_all["quality"]
for i, item in quality.items():
      # Your process here, the variable
      # 'item' is already a numeric value as defined in DataFrame

However, as you are doing a row-wise process and each row is independent of each other, I would suggest taking a look at .apply(). Then your code can be done in a single line in a more efficient and pythonic way:

wine_data_all["quality"] = wine_data_all["quality"].apply(lambda x: 0 if (x == 5 or x == 6) else x)

P.S. For further reading into indexing in Pandas, take a look at the methods loc and iloc