Home > Enterprise >  More efictive method of test large dataframe and add value based on another value different size/not
More efictive method of test large dataframe and add value based on another value different size/not

Time:10-22

Lot of answers on merging and full col, but can't figure out a more effective method. for my situation.

current version of python, pandas, numpy, and file format is parquet

Simply put if col1 ==x the col 10 = 1, col11 = 2, col... etc.

look1 = 'EMPLOYEE'
look2 = 'CHESTER'
look3 = "TONY'S"
look4 = "VICTOR'S"

tgt1 = 'inv_group'
tgt2 = 'acc_num'

for x in range(len(df['ph_name'])):
    df[tgt1][x] = 'MEMORIAL'
    df[tgt2][x] = 12345
elif df['ph_name'][x] == look2:
    df[tgt1][x] = 'WALMART'
    df[tgt2][x] = 45678
elif df['ph_name'][x] == look3:
    df[tgt1][x] = 'TONYS'
    df[tgt2][x] = 27359
elif df['ph_name'][x] == look4:
    df[tgt1][x] = 'VICTOR'
    df[tgt2][x] = 45378



basic sample:
  unit_name        tgt1        tgt2
0 EMPLOYEE         Nan         Nan
1 EMPLOYEE         Nan         Nan
2 TONY'S           Nan         Nan
3 CHESTER          Nan         Nan
4 VICTOR'S         Nan         Nan
5 EMPLOYEE         Nan         Nan


GOAL:
  unit_name        tgt1        tgt2
0 EMPLOYEE         MEMORIAL    12345
1 EMPLOYEE         MEMORIAL    12345
2 TONY'S           TONYS       27359
3 CHESTER          WALMART     45678
4 VICTOR'S         VICTOR      45378
5 EMPLOYEE         MEMORIAL    12345

So this works... I get the custom columns values added, It's not the fastest under the sun, but it works.

It takes 6.2429744 on 28896 rows. I'm concerned when I put it to the grind, It's going to start dragging me down.

The other downside is I get this annoyance... Yes I can silence, but I feel like this might be due to a bad practice that I should know how to curtail.

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

Basically...

  1. Is there a way to optimize this?
  2. Is this warning due to a bad habit, my ignorance, or do I just need to silence it?

CodePudding user response:

Flying blind here since I don't see your data:

cond_list = [df["ph_name"] == look for look in [look1, look2, look3, look4]]

# Rows ph_name outside of the list will keep their original values
df[tgt1] = np.select(cond_list, ["MEMORIAL", "WALMART", "TONY'S", "VICTOR"])
df[tgt2] = np.select(cond_list, [12345, 45678, 27359, 45378])

CodePudding user response:

Given: (It's silly to have all NaN columns)

  unit_name
0  EMPLOYEE
1  EMPLOYEE
2    TONY'S
3   CHESTER
4  VICTOR'S
5  EMPLOYEE

Doing: (Let's use np.select and create a dictionary for easier future modification)

looks = ['EMPLOYEE', 'CHESTER', "TONY'S", "VICTOR'S"]

new_cols = {
   'inv_group': ["MEMORIAL", "WALMART", "TONYS", "VICTOR"],
   'acc_num': [12345, 45678, 27359, 45378]
}

condlist = [df['unit_name'].eq(x) for x in looks]

for col, choicelist in new_cols.items():
    df[col] = np.select(condlist, choicelist)

print(df)

Output: (I assumed you'd typed the column names wrong)

  unit_name inv_group  acc_num
0  EMPLOYEE  MEMORIAL    12345
1  EMPLOYEE  MEMORIAL    12345
2    TONY'S     TONYS    27359
3   CHESTER   WALMART    45678
4  VICTOR'S    VICTOR    45378
5  EMPLOYEE  MEMORIAL    12345
  • Related