I have attached an image of my dataframe, and the code of the methods I tried. My goal is to switch the first half of the values in a row with the second half of the values in that row if the row satisfies a condition.
The first method checks if the condition is true (values need to be switched), and then assigns the new values directly to the original dataframe.
The second methods checks if the condition is true (values need to be switched) and adds the values to two separate dataframes. If the condition is not true I add the original values to df1 and df2. At the end of the block, I was planning on combining the dataframes together again.
However, both of these methods take super long to run, and it seems there has to be something more efficient. I had trouble finding the most efficient way online, and I would appreciate any help. Thank you!
METHOD 1:
finalGameID = list(final.loc[:,'GameID'])
for i,v in enumerate(finalGameID):
if final['HomeAway'][i] == 0:
print(v)
values = final.loc[i].values
value1 = list(values[4:124])
value2 = list(values[124:])
final.iloc[i,4:124] = value2
final.iloc[i,124:] = value1
METHOD 2:
df1 = final[final.columns[4:124]]
df2 = final[final.columns[124:]]
df3 = final[final.columns[0:4]]
df1 = df1[0:0]
df2 = df2[0:0]
finalGameID = list(final.loc[:,'GameID'])
for i,v in enumerate(finalGameID):
values = final.loc[i].values
value1 = list(values[4:124])
value2 = list(values[124:])
if final['HomeAway'][i] == 0:
print(v)
df1.loc[len(df1.index)] = value2
df2.loc[len(df2.index)] = value1
else:
df1.loc[len(df1.index)] = value1
df2.loc[len(df2.index)] = value2
CodePudding user response:
Your approach is show because you loop over the rows and use intermediate copies.
You should be able to use boolean indexing for direct swapping:
mask = final['HomeAway'].eq(0)
final.loc[mask, 4:124], final.loc[mask, 124:] = final.loc[mask, 124:], final.loc[mask, 4:124]
CodePudding user response:
The Data on which you are working is unknown and I have tried to replicate your problem with duplicate data. Change the variables and the indexing values while using it in your project
CODE
import pandas as pd
import numpy as np
data = pd.DataFrame({"HomeAway": [1, 1, 0, 0, 1],
"Value1": [14, 16, 29, 22, 21],
"Value2": [8, 14, 24, 14, 19],
"Value3": [6, 2, 5, 8, 2],
"Value4": [3, 3, 2, 2, 0]})
print("BEFORE")
print(data)
left = np.asanyarray(data[data["HomeAway"] == 0].iloc[:, 1:3])
right = np.asanyarray(data[data["HomeAway"] == 0].iloc[:, 3:5])
data.iloc[data["HomeAway"] == 0, 1:3] = right
data.iloc[data["HomeAway"] == 0, 3:5] = left
print("AFTER")
print(data)
OUTPUT
BEFORE
HomeAway Value1 Value2 Value3 Value4
0 1 14 8 6 3
1 1 16 14 2 3
2 0 29 24 5 2
3 0 22 14 8 2
4 1 21 19 2 0
AFTER
HomeAway Value1 Value2 Value3 Value4
0 1 14 8 6 3
1 1 16 14 2 3
2 0 5 2 29 24
3 0 8 2 22 14
4 1 21 19 2 0