Home > Net >  Map values of a column to another column if condition
Map values of a column to another column if condition

Time:05-24

This is my dataframe:

    text         from_id    to_id   position
0   NaN           NaN        NaN    [nan, nan]
1   albumin       NaN        NaN    [0.0, 23.0]
2   hepatoma      NaN        NaN    [93.0, 101.0]
3   NaN          hepatoma  albumin  [nan, nan]

I want to create two new columns named from_position and to_position and I want them to be like this:

        text    from_id     to_id   position      from_position  to_position
    0   NaN      NaN         NaN    [nan, nan]      [nan, nan]   [nan, nan]
    1   albumin  NaN         NaN    [0.0, 23.0]     [nan, nan]   [nan, nan]  
    2   hepatoma NaN         NaN    [93.0, 101.0]   [nan, nan]   [nan, nan]
    3   NaN     hepatoma  albumin   [nan, nan]      [93.0, 101.0]  [0.0, 23.0]

Note that it doesn't have to be [nan,nan] it can also be NaN. Moreover, the position values aren't lists of floats, they are strings (I'm ok with converting them).

I tried this

s = full_df_normalized.dropna(subset=['text', 'position']).set_index('text')['position'].to_dict()

cols = ['from_position','to_position']
full_df_normalized[cols] = full_df_normalized[cols].apply(lambda x: x.map(d))

But I got this error: KeyError: "None of [Index(['from_position', 'to_position'], dtype='object')] are in the [columns]"

So, I created 2 empty (NaN) columns named 'from_position','to_position' and ran it again, but they are still NaN.

I also tried this :

for i,row in full_df_normalized.iterrows():
    for key,value in sd.items():
        if key == row['from_id']:
            row['from_position'] == value
            

But once again, the 'from_position' column remains NaN. Can someone help?

CodePudding user response:

In your solution apply mapping for id columns from list cols1, is necessary same ordering like col, last replace missing values by list [np.nan, np.nan]:

nan = np.nan
full_df_normalized = pd.DataFrame({'text': [nan, 'albumin', 'hepatoma', nan], 'from_id': [nan, nan, nan, 'hepatoma'], 'to_id': [nan, nan, nan, 'albumin'], 'position': [[nan, nan], [0.0, 23.0], [93.0, 101.0], [nan, nan]]})

print (full_df_normalized)
       text   from_id    to_id       position
0       NaN       NaN      NaN     [nan, nan]
1   albumin       NaN      NaN    [0.0, 23.0]
2  hepatoma       NaN      NaN  [93.0, 101.0]
3       NaN  hepatoma  albumin     [nan, nan]

d = full_df_normalized.dropna(subset=['text', 'position']).set_index('text')['position'].to_dict()

cols = ['from_position','to_position']
cols1 = ['from_id','to_id']

f = lambda x:x.map(d).dropna().reindex(full_df_normalized.index,fill_value=[np.nan,np.nan])
full_df_normalized[cols] = full_df_normalized[cols1].apply(f)
print (full_df_normalized)
       text   from_id    to_id      position from_position to_position
0       NaN       NaN      NaN     [nan,nan]    [nan, nan]  [nan, nan]
1   albumin       NaN      NaN    [0.0,23.0]    [nan, nan]  [nan, nan]
2  hepatoma       NaN      NaN  [93.0,101.0]    [nan, nan]  [nan, nan]
3       NaN  hepatoma  albumin     [nan,nan]  [93.0,101.0]  [0.0,23.0]

Or you can processing each coumn separately:

for c in cols:
    c1 = c.replace('_position', '_id')
    full_df_normalized[c] = full_df_normalized[c1].map(d).dropna().reindex(full_df_normalized.index, fill_value=[np.nan, np.nan])


print (full_df_normalized)
       text   from_id    to_id      position from_position to_position
0       NaN       NaN      NaN     [nan,nan]    [nan, nan]  [nan, nan]
1   albumin       NaN      NaN    [0.0,23.0]    [nan, nan]  [nan, nan]
2  hepatoma       NaN      NaN  [93.0,101.0]    [nan, nan]  [nan, nan]
3       NaN  hepatoma  albumin     [nan,nan]  [93.0,101.0]  [0.0,23.0]

CodePudding user response:

Using a reshaped DataFrame as intermediate:

cols = ['from_id', 'to_id']
df.join(df
 .reset_index()
 .drop(columns='position')
 .melt(id_vars='index', value_vars=cols, value_name='text')
 .merge(df.dropna(subset='text'), on='text')
 .pivot('index', 'variable', 'position')
 .rename(columns=lambda s: s.replace('_id', '_position'))
)

output:

     text   from_id    to_id       position  from_position  to_position
0       NaN       NaN      NaN     [nan, nan]            NaN          NaN
1   albumin       NaN      NaN    [0.0, 23.0]            NaN          NaN
2  hepatoma       NaN      NaN  [93.0, 101.0]            NaN          NaN
3       NaN  hepatoma  albumin     [nan, nan]  [93.0, 101.0]  [0.0, 23.0]
  • Related