Lot of answers on merging and full col, but can't figure out a more effective method. for my situation.
current version of python, pandas, numpy, and file format is parquet
Simply put if col1 ==x the col 10 = 1, col11 = 2, col... etc.
look1 = 'EMPLOYEE'
look2 = 'CHESTER'
look3 = "TONY'S"
look4 = "VICTOR'S"
tgt1 = 'inv_group'
tgt2 = 'acc_num'
for x in range(len(df['ph_name'])):
df[tgt1][x] = 'MEMORIAL'
df[tgt2][x] = 12345
elif df['ph_name'][x] == look2:
df[tgt1][x] = 'WALMART'
df[tgt2][x] = 45678
elif df['ph_name'][x] == look3:
df[tgt1][x] = 'TONYS'
df[tgt2][x] = 27359
elif df['ph_name'][x] == look4:
df[tgt1][x] = 'VICTOR'
df[tgt2][x] = 45378
basic sample:
unit_name tgt1 tgt2
0 EMPLOYEE Nan Nan
1 EMPLOYEE Nan Nan
2 TONY'S Nan Nan
3 CHESTER Nan Nan
4 VICTOR'S Nan Nan
5 EMPLOYEE Nan Nan
GOAL:
unit_name tgt1 tgt2
0 EMPLOYEE MEMORIAL 12345
1 EMPLOYEE MEMORIAL 12345
2 TONY'S TONYS 27359
3 CHESTER WALMART 45678
4 VICTOR'S VICTOR 45378
5 EMPLOYEE MEMORIAL 12345
So this works... I get the custom columns values added, It's not the fastest under the sun, but it works.
It takes 6.2429744 on 28896 rows. I'm concerned when I put it to the grind, It's going to start dragging me down.
The other downside is I get this annoyance... Yes I can silence, but I feel like this might be due to a bad practice that I should know how to curtail.
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
Basically...
- Is there a way to optimize this?
- Is this warning due to a bad habit, my ignorance, or do I just need to silence it?
CodePudding user response:
Flying blind here since I don't see your data:
cond_list = [df["ph_name"] == look for look in [look1, look2, look3, look4]]
# Rows ph_name outside of the list will keep their original values
df[tgt1] = np.select(cond_list, ["MEMORIAL", "WALMART", "TONY'S", "VICTOR"])
df[tgt2] = np.select(cond_list, [12345, 45678, 27359, 45378])
CodePudding user response:
Given: (It's silly to have all NaN
columns)
unit_name
0 EMPLOYEE
1 EMPLOYEE
2 TONY'S
3 CHESTER
4 VICTOR'S
5 EMPLOYEE
Doing: (Let's use np.select
and create a dictionary for easier future modification)
looks = ['EMPLOYEE', 'CHESTER', "TONY'S", "VICTOR'S"]
new_cols = {
'inv_group': ["MEMORIAL", "WALMART", "TONYS", "VICTOR"],
'acc_num': [12345, 45678, 27359, 45378]
}
condlist = [df['unit_name'].eq(x) for x in looks]
for col, choicelist in new_cols.items():
df[col] = np.select(condlist, choicelist)
print(df)
Output: (I assumed you'd typed the column names wrong)
unit_name inv_group acc_num
0 EMPLOYEE MEMORIAL 12345
1 EMPLOYEE MEMORIAL 12345
2 TONY'S TONYS 27359
3 CHESTER WALMART 45678
4 VICTOR'S VICTOR 45378
5 EMPLOYEE MEMORIAL 12345