Home > OS >  Pandas: Replace values in column based on a match in a different column
Pandas: Replace values in column based on a match in a different column

Time:09-22

I'm resetting the question to make it clearer to everyone regarding the DataFrame I have issues with. The DataFrame contains over 100,000 rows with over 10 columns. The important issue I have is filling in any values in the id column that previously matches some value in type column. The values in the id column are not unique. Sometimes both id and type values are NIL.

Note: The order of things should must not be impacted as the records are dated with other columns. There are also no empty cells, all empty cells are filled with "NIL".

GOAL: To replace as many "NIL" as possible in the id column only. I do not care for any "NIL" values in the type column. But everything else must remain the same.

I am assuming this can be done without loops. Thank you for your help! Really appreciate it, strangers!

Current DataFrame

COL1 id type COL4 COL5
NIL 123 moving NIL NIL
... 123 lend ... ...
NIL 456 penny ... NIL
... NIL appendix NIL ...
... 251 appendix ... ...
NIL NIL lend ... NIL
... 665 NIL NIL ...
... NIL NIL ... ...
... 251 retailer NIL NIL
NIL 251 appendix ... NIL
NIL 456 NIL ... NIL

Expected DataFrame

COL1 id type COL4 COL5
NIL 123 moving NIL NIL
... 123 lend ... ...
NIL 456 penny ... NIL
... 251 appendix NIL ...
... 251 appendix ... ...
NIL 123 lend ... NIL
... 665 NIL NIL ...
... NIL NIL ... ...
... 251 retailer NIL NIL
NIL 251 appendix ... NIL
NIL 456 NIL ... NIL

CodePudding user response:

You can use a merge

# assemble the dataframe
In [4]: a
Out[4]: 
[26235,
 'fruit',
 26235,
 'veggie',
 32256,
 'NIL',
 'NIL',
 'pawn',
 12415,
 'NIL',
 'NIL',
 'fruit',
 'NIL',
 'veggie',
 24145,
 'pawn',
 'NIL',
 'NIL']

In [5]: df = pandas.DataFrame({'id':a[::2], 'type':a[1::2]})
 
#      id    type
#0  26235   fruit
#1  26235  veggie
#2  32256     NIL
#3    NIL    pawn
#4  12415     NIL
#5    NIL   fruit
#6    NIL  veggie
#7  24145    pawn
#8    NIL     NIL


In [6]: id_is_nill = df.id=="NIL"

In [7]: left = df.loc[id_is_nill].reset_index()

In [8]: right = df.loc[~id_is_nill].reset_index()

In [9]: mrg = pandas.merge(left, right, on='type').query("type != 'NIL'")

In [10]: mrg
Out[10]: 
   index_x id_x    type  index_y   id_y
0        3  NIL    pawn        7  24145
1        5  NIL   fruit        0  26235
2        6  NIL  veggie        1  26235

In [11]: df.loc[mrg.index_x, 'id'] = mrg.id_y.values

In [12]: df
Out[12]: 
      id    type
0  26235   fruit
1  26235  veggie
2  32256     NIL
3  24145    pawn
4  12415     NIL
5  26235   fruit
6  26235  veggie
7  24145    pawn
8    NIL     NIL

CodePudding user response:

here is one way to do it

create a dictionary of type and id, and then using loc to fill in the missing values

d=df[df['id'].ne('NIL')].drop_duplicates(subset=['type']).to_dict()

df.loc[df['id'].eq('NIL'), 'id'] = df['type'].map(d)
df
id  type
0   26235   fruit
1   26235   veggie
2   32256   NIL
3   24145   pawn
4   12415   NIL
5   26235   fruit
6   26235   veggie
7   24145   pawn

CodePudding user response:

You can do:

#d = df.loc[(df.id != 'NIL') & (df.type != 'NIL')].set_index('type')['id']
d = df[df!='NIL'].dropna().set_index('type')['id']

df.loc[df.id =='NIL', 'id'] = df.loc[df.id == 'NIL', 'type'].map(d)

print(df):

     id    type
0  26235   fruit
1  26235  veggie
2  32256     NIL
3  24145    pawn
4  12415     NIL
5  26235   fruit
6  26235  veggie
7  24145    pawn
  • Related