Trying to iterate through rows of pandas dataframe and edit row if it satisfies a condition-CodePudding

I have attached an image of my dataframe, and the code of the methods I tried. My goal is to switch the first half of the values in a row with the second half of the values in that row if the row satisfies a condition.

The first method checks if the condition is true (values need to be switched), and then assigns the new values directly to the original dataframe.

The second methods checks if the condition is true (values need to be switched) and adds the values to two separate dataframes. If the condition is not true I add the original values to df1 and df2. At the end of the block, I was planning on combining the dataframes together again.

However, both of these methods take super long to run, and it seems there has to be something more efficient. I had trouble finding the most efficient way online, and I would appreciate any help. Thank you!

datframe

METHOD 1:

finalGameID = list(final.loc[:,'GameID'])
for i,v in enumerate(finalGameID):
  if final['HomeAway'][i] == 0:
    print(v)
    values = final.loc[i].values
    value1 = list(values[4:124])
    value2 = list(values[124:])
    final.iloc[i,4:124] = value2
    final.iloc[i,124:] = value1

METHOD 2:

df1 = final[final.columns[4:124]]
df2 = final[final.columns[124:]]
df3 = final[final.columns[0:4]]
df1 = df1[0:0]
df2 = df2[0:0]
finalGameID = list(final.loc[:,'GameID'])

for i,v in enumerate(finalGameID):
    values = final.loc[i].values
    value1 = list(values[4:124])
    value2 = list(values[124:])
    if final['HomeAway'][i] == 0:
        print(v)
        df1.loc[len(df1.index)] = value2
        df2.loc[len(df2.index)] = value1
    else:
        df1.loc[len(df1.index)] = value1
        df2.loc[len(df2.index)] = value2

CodePudding user response：

Your approach is show because you loop over the rows and use intermediate copies.

You should be able to use boolean indexing for direct swapping:

mask = final['HomeAway'].eq(0)

final.loc[mask, 4:124], final.loc[mask, 124:] = final.loc[mask, 124:], final.loc[mask, 4:124]

CodePudding user response：

The Data on which you are working is unknown and I have tried to replicate your problem with duplicate data. Change the variables and the indexing values while using it in your project

CODE

import pandas as pd
import numpy as np

data = pd.DataFrame({"HomeAway": [1, 1, 0, 0, 1],
                     "Value1": [14, 16, 29, 22, 21],
                     "Value2": [8, 14, 24, 14, 19],
                     "Value3": [6, 2, 5, 8, 2],
                     "Value4": [3, 3, 2, 2, 0]})

print("BEFORE")
print(data)

left = np.asanyarray(data[data["HomeAway"] == 0].iloc[:, 1:3])
right = np.asanyarray(data[data["HomeAway"] == 0].iloc[:, 3:5])

data.iloc[data["HomeAway"] == 0, 1:3] = right
data.iloc[data["HomeAway"] == 0, 3:5] = left

print("AFTER")
print(data)

OUTPUT

BEFORE

   HomeAway  Value1  Value2  Value3  Value4
0         1      14       8       6       3
1         1      16      14       2       3
2         0      29      24       5       2
3         0      22      14       8       2
4         1      21      19       2       0

AFTER

   HomeAway  Value1  Value2  Value3  Value4
0         1      14       8       6       3
1         1      16      14       2       3
2         0       5       2      29      24
3         0       8       2      22      14
4         1      21      19       2       0