Python for-loop to change row value based on a condition works correctly but does not change the val-CodePudding

I am just getting into Python, and I am trying to make a for-loop that loops on every row and randomly select two columns on each iteration based on a given condition and change their values. The for-loop works without any problems; however, the results don't change on the dataframe.

A reproducible example:

df= pd.DataFrame({'A': [10,40,10,20,10],
                  'B': [10,10,50,40,50],
                  'C': [10,20,10,10,10],
                  'D': [10,30,10,10,50],
                  'E': [10,10,40,10,10],
                  'F': [2,3,2,2,3]})

df:


    A   B   C   D   E   F
0   10  10  10  10  10  2
1   40  10  20  30  10  3
2   10  50  10  10  40  2
3   20  40  10  10  10  2
4   10  50  10  50  10  3

This is my for-loop; the for loop iterates on all rows and check if the value on column F = 2; it randomly selects two columns with value 10 and change them to 100.

for index, i in df.iterrows():
  if i['F'] == 2:
    i[i==10].sample(2, axis=0) 100
    print(i[i==10].sample(2, axis=0) 100)

This is the output of the loop:

E    110
C    110
Name: 0, dtype: int64
C    110
D    110
Name: 2, dtype: int64
C    110
D    110
Name: 3, dtype: int64

This is what the dataframe is expected to look like:

df:


    A   B   C   D   E   F
0   10  10  110 10  110 2
1   40  10  20  30  10  3
2   10  50  110 110 40  2
3   20  40  110 110 10  2
4   10  50  10  50  10  3

However, the columns on the dataframe are not change. Any idea what's going wrong?

CodePudding user response：

This line:

i[i==10].sample(2, axis=0) 100

.sample returns a new dataframe so the original dataframe (df) was not updated at all.

Try this:

for index, i in df.iterrows():
    if i['F'] == 2:
        cond = (i == 10)

        # You can only sample 2 rows if there are at
        # least 2 rows meeting the condition
        if cond.sum() >= 2:
            idx = i[cond].sample(2).index
            i[idx]  = 100
            print(i[idx])

CodePudding user response：

You should not modify the original df in place. Make a copy and iterate:

df2 = df.copy()
for index, i in df.iterrows():
    if i['F'] == 2:
        s = i[i==10].sample(2, axis=0) 100
        df2.loc[index,i.index.isin(s.index)] = s