Pandas: Replace values in column based on a match in a different column-CodePudding

I'm resetting the question to make it clearer to everyone regarding the DataFrame I have issues with. The DataFrame contains over 100,000 rows with over 10 columns. The important issue I have is filling in any values in the id column that previously matches some value in type column. The values in the id column are not unique. Sometimes both id and type values are NIL.

Note: The order of things should must not be impacted as the records are dated with other columns. There are also no empty cells, all empty cells are filled with "NIL".

GOAL: To replace as many "NIL" as possible in the id column only. I do not care for any "NIL" values in the type column. But everything else must remain the same.

I am assuming this can be done without loops. Thank you for your help! Really appreciate it, strangers!

Current DataFrame

COL1	id	type	COL4	COL5
NIL	123	moving	NIL	NIL
...	123	lend	...	...
NIL	456	penny	...	NIL
...	NIL	appendix	NIL	...
...	251	appendix	...	...
NIL	NIL	lend	...	NIL
...	665	NIL	NIL	...
...	NIL	NIL	...	...
...	251	retailer	NIL	NIL
NIL	251	appendix	...	NIL
NIL	456	NIL	...	NIL

Expected DataFrame

COL1	id	type	COL4	COL5
NIL	123	moving	NIL	NIL
...	123	lend	...	...
NIL	456	penny	...	NIL
...	251	appendix	NIL	...
...	251	appendix	...	...
NIL	123	lend	...	NIL
...	665	NIL	NIL	...
...	NIL	NIL	...	...
...	251	retailer	NIL	NIL
NIL	251	appendix	...	NIL
NIL	456	NIL	...	NIL

CodePudding user response：

You can use a merge

# assemble the dataframe
In [4]: a
Out[4]: 
[26235,
 'fruit',
 26235,
 'veggie',
 32256,
 'NIL',
 'NIL',
 'pawn',
 12415,
 'NIL',
 'NIL',
 'fruit',
 'NIL',
 'veggie',
 24145,
 'pawn',
 'NIL',
 'NIL']

In [5]: df = pandas.DataFrame({'id':a[::2], 'type':a[1::2]})
 
#      id    type
#0  26235   fruit
#1  26235  veggie
#2  32256     NIL
#3    NIL    pawn
#4  12415     NIL
#5    NIL   fruit
#6    NIL  veggie
#7  24145    pawn
#8    NIL     NIL


In [6]: id_is_nill = df.id=="NIL"

In [7]: left = df.loc[id_is_nill].reset_index()

In [8]: right = df.loc[~id_is_nill].reset_index()

In [9]: mrg = pandas.merge(left, right, on='type').query("type != 'NIL'")

In [10]: mrg
Out[10]: 
   index_x id_x    type  index_y   id_y
0        3  NIL    pawn        7  24145
1        5  NIL   fruit        0  26235
2        6  NIL  veggie        1  26235

In [11]: df.loc[mrg.index_x, 'id'] = mrg.id_y.values

In [12]: df
Out[12]: 
      id    type
0  26235   fruit
1  26235  veggie
2  32256     NIL
3  24145    pawn
4  12415     NIL
5  26235   fruit
6  26235  veggie
7  24145    pawn
8    NIL     NIL

CodePudding user response：

here is one way to do it

create a dictionary of type and id, and then using loc to fill in the missing values

d=df[df['id'].ne('NIL')].drop_duplicates(subset=['type']).to_dict()

df.loc[df['id'].eq('NIL'), 'id'] = df['type'].map(d)
df

id  type
0   26235   fruit
1   26235   veggie
2   32256   NIL
3   24145   pawn
4   12415   NIL
5   26235   fruit
6   26235   veggie
7   24145   pawn

CodePudding user response：

You can do:

#d = df.loc[(df.id != 'NIL') & (df.type != 'NIL')].set_index('type')['id']
d = df[df!='NIL'].dropna().set_index('type')['id']

df.loc[df.id =='NIL', 'id'] = df.loc[df.id == 'NIL', 'type'].map(d)

print(df):

     id    type
0  26235   fruit
1  26235  veggie
2  32256     NIL
3  24145    pawn
4  12415     NIL
5  26235   fruit
6  26235  veggie
7  24145    pawn