Map values of a column to another column if condition-CodePudding

This is my dataframe:

    text         from_id    to_id   position
0   NaN           NaN        NaN    [nan, nan]
1   albumin       NaN        NaN    [0.0, 23.0]
2   hepatoma      NaN        NaN    [93.0, 101.0]
3   NaN          hepatoma  albumin  [nan, nan]

I want to create two new columns named from_position and to_position and I want them to be like this:

        text    from_id     to_id   position      from_position  to_position
    0   NaN      NaN         NaN    [nan, nan]      [nan, nan]   [nan, nan]
    1   albumin  NaN         NaN    [0.0, 23.0]     [nan, nan]   [nan, nan]  
    2   hepatoma NaN         NaN    [93.0, 101.0]   [nan, nan]   [nan, nan]
    3   NaN     hepatoma  albumin   [nan, nan]      [93.0, 101.0]  [0.0, 23.0]

Note that it doesn't have to be [nan,nan] it can also be NaN. Moreover, the position values aren't lists of floats, they are strings (I'm ok with converting them).

I tried this

s = full_df_normalized.dropna(subset=['text', 'position']).set_index('text')['position'].to_dict()

cols = ['from_position','to_position']
full_df_normalized[cols] = full_df_normalized[cols].apply(lambda x: x.map(d))

But I got this error: KeyError: "None of [Index(['from_position', 'to_position'], dtype='object')] are in the [columns]"

So, I created 2 empty (NaN) columns named 'from_position','to_position' and ran it again, but they are still NaN.

I also tried this :

for i,row in full_df_normalized.iterrows():
    for key,value in sd.items():
        if key == row['from_id']:
            row['from_position'] == value

But once again, the 'from_position' column remains NaN. Can someone help?

CodePudding user response：

In your solution apply mapping for id columns from list cols1, is necessary same ordering like col, last replace missing values by list [np.nan, np.nan]:

nan = np.nan
full_df_normalized = pd.DataFrame({'text': [nan, 'albumin', 'hepatoma', nan], 'from_id': [nan, nan, nan, 'hepatoma'], 'to_id': [nan, nan, nan, 'albumin'], 'position': [[nan, nan], [0.0, 23.0], [93.0, 101.0], [nan, nan]]})

print (full_df_normalized)
       text   from_id    to_id       position
0       NaN       NaN      NaN     [nan, nan]
1   albumin       NaN      NaN    [0.0, 23.0]
2  hepatoma       NaN      NaN  [93.0, 101.0]
3       NaN  hepatoma  albumin     [nan, nan]

d = full_df_normalized.dropna(subset=['text', 'position']).set_index('text')['position'].to_dict()

cols = ['from_position','to_position']
cols1 = ['from_id','to_id']

f = lambda x:x.map(d).dropna().reindex(full_df_normalized.index,fill_value=[np.nan,np.nan])
full_df_normalized[cols] = full_df_normalized[cols1].apply(f)
print (full_df_normalized)
       text   from_id    to_id      position from_position to_position
0       NaN       NaN      NaN     [nan,nan]    [nan, nan]  [nan, nan]
1   albumin       NaN      NaN    [0.0,23.0]    [nan, nan]  [nan, nan]
2  hepatoma       NaN      NaN  [93.0,101.0]    [nan, nan]  [nan, nan]
3       NaN  hepatoma  albumin     [nan,nan]  [93.0,101.0]  [0.0,23.0]

Or you can processing each coumn separately:

for c in cols:
    c1 = c.replace('_position', '_id')
    full_df_normalized[c] = full_df_normalized[c1].map(d).dropna().reindex(full_df_normalized.index, fill_value=[np.nan, np.nan])


print (full_df_normalized)
       text   from_id    to_id      position from_position to_position
0       NaN       NaN      NaN     [nan,nan]    [nan, nan]  [nan, nan]
1   albumin       NaN      NaN    [0.0,23.0]    [nan, nan]  [nan, nan]
2  hepatoma       NaN      NaN  [93.0,101.0]    [nan, nan]  [nan, nan]
3       NaN  hepatoma  albumin     [nan,nan]  [93.0,101.0]  [0.0,23.0]

CodePudding user response：

Using a reshaped DataFrame as intermediate:

cols = ['from_id', 'to_id']
df.join(df
 .reset_index()
 .drop(columns='position')
 .melt(id_vars='index', value_vars=cols, value_name='text')
 .merge(df.dropna(subset='text'), on='text')
 .pivot('index', 'variable', 'position')
 .rename(columns=lambda s: s.replace('_id', '_position'))
)

output:

     text   from_id    to_id       position  from_position  to_position
0       NaN       NaN      NaN     [nan, nan]            NaN          NaN
1   albumin       NaN      NaN    [0.0, 23.0]            NaN          NaN
2  hepatoma       NaN      NaN  [93.0, 101.0]            NaN          NaN
3       NaN  hepatoma  albumin     [nan, nan]  [93.0, 101.0]  [0.0, 23.0]