Pandas: Dynamically deciede which column to update based on another column's value-CodePudding

Let's say we have the following dataframe

    handle  image_src   image_position  image_src_1 image_src_2 image_src_3
0        a       img1                1         None        None        None
1        b       img2                2         None        None        None
2        c       img3                3         None        None        None

How can we convert this datafame into this?

    handle  image_src   image_position  image_src_1 image_src_2 image_src_3
0        a       img1                1         img1        None        None
1        b       img2                2         None        img2        None
2        c       img3                3         None        None        img3

Basically I want to fill the image_src_{image_position} column based on the value of image_position of each row.

I come up with a solution which involves manually iterating over the whole dataframe, like this

data = [
    {"handle": "a","image_src": "img1","image_position": 1,"image_src_1": None,"image_src_2": None,"image_src_3": None},
    {"handle": "b","image_src": "img2","image_position": 2,"image_src_1": None,"image_src_2": None,"image_src_3": None,
    {"handle": "c","image_src": "img3","image_position": 3,"image_src_1": None,"image_src_2": None,"image_src_3": None}
]

df = pd.DataFrame(data)

for index in range(0, len(df)):
    row = df.iloc[index]
    
    position = row["image_position"].astype("int64")
    
    df.loc[index, f"image_src_{position}"] = row["image_src"]

But as iterating over the whole dataframe is a bad thing, how can I imporove this?

CodePudding user response：

First that came to my mind was .pivot() function. Though this might not be the exact thing you wanted, — the original image_src column would be dropped, the column naming would slightly differ and nan might need handling, — it's rather a short solution.

df.pivot(index=['handle', 'image_position'],  # to keep these columns intact
         columns='image_src',
         values='image_src'
 ).add_prefix('image_src_'
 ).reset_index()

What I got:

	handle	image_position	image_src_img1	image_src_img2	image_src_img3
0	a	1	img1	nan	nan
1	b	2	nan	img2	nan
2	c	3	nan	nan	img3

Another take is to use image_position column as an index for the new columns, thus it would be dropped too. Also may replace nan with None:

import numpy as np

df.pivot(index='handle',
         columns='image_position',
         values='image_src'
  ).add_prefix('image_src_'
  ).replace({np.nan: None}
  ).reset_index()

Goes like this:

	handle	image_src_1	image_src_2	image_src_3
0	a	img1	None	None
1	b	None	img2	None
2	c	None	None	img3

CodePudding user response：

here is one way to do it, assuming that the suffix to the image name is not associated to the column name suffix. Instead its the image position to decide the column

#melt the dataframe
df2=df.melt(id_vars=['handle','image_src','image_position'])

# identify the rows where the image_position matches the column suffix
df2['idx']=(df2['variable'].str.rsplit('_',1, expand=True)[1]).astype(int).eq(df2['image_position'])

# updates the value column with the images name
df2.loc[df2['idx']==True, 'value'] = df2['image_src']

#pivot to reverse the melt
df2.pivot(index=['handle','image_src','image_position'], columns='variable', values='value').reset_index()

variable    handle  image_src   image_position  image_src_1     image_src_2     image_src_3
0                a  img1                     1         img1             None    None
1                b  img2                     2         None             img2    None
2                c  img3                     3         None             None    img3