Home > Blockchain >  Pandas: Dynamically deciede which column to update based on another column's value
Pandas: Dynamically deciede which column to update based on another column's value

Time:09-19

Let's say we have the following dataframe

    handle  image_src   image_position  image_src_1 image_src_2 image_src_3
0        a       img1                1         None        None        None
1        b       img2                2         None        None        None
2        c       img3                3         None        None        None

How can we convert this datafame into this?

    handle  image_src   image_position  image_src_1 image_src_2 image_src_3
0        a       img1                1         img1        None        None
1        b       img2                2         None        img2        None
2        c       img3                3         None        None        img3

Basically I want to fill the image_src_{image_position} column based on the value of image_position of each row.

I come up with a solution which involves manually iterating over the whole dataframe, like this

data = [
    {"handle": "a","image_src": "img1","image_position": 1,"image_src_1": None,"image_src_2": None,"image_src_3": None},
    {"handle": "b","image_src": "img2","image_position": 2,"image_src_1": None,"image_src_2": None,"image_src_3": None,
    {"handle": "c","image_src": "img3","image_position": 3,"image_src_1": None,"image_src_2": None,"image_src_3": None}
]

df = pd.DataFrame(data)

for index in range(0, len(df)):
    row = df.iloc[index]
    
    position = row["image_position"].astype("int64")
    
    df.loc[index, f"image_src_{position}"] = row["image_src"]

But as iterating over the whole dataframe is a bad thing, how can I imporove this?

CodePudding user response:

First that came to my mind was .pivot() function. Though this might not be the exact thing you wanted, — the original image_src column would be dropped, the column naming would slightly differ and nan might need handling, — it's rather a short solution.

df.pivot(index=['handle', 'image_position'],  # to keep these columns intact
         columns='image_src',
         values='image_src'
 ).add_prefix('image_src_'
 ).reset_index()

What I got:

handle image_position image_src_img1 image_src_img2 image_src_img3
0 a 1 img1 nan nan
1 b 2 nan img2 nan
2 c 3 nan nan img3

Another take is to use image_position column as an index for the new columns, thus it would be dropped too. Also may replace nan with None:

import numpy as np

df.pivot(index='handle',
         columns='image_position',
         values='image_src'
  ).add_prefix('image_src_'
  ).replace({np.nan: None}
  ).reset_index()

Goes like this:

handle image_src_1 image_src_2 image_src_3
0 a img1 None None
1 b None img2 None
2 c None None img3

CodePudding user response:

here is one way to do it, assuming that the suffix to the image name is not associated to the column name suffix. Instead its the image position to decide the column

#melt the dataframe
df2=df.melt(id_vars=['handle','image_src','image_position'])

# identify the rows where the image_position matches the column suffix
df2['idx']=(df2['variable'].str.rsplit('_',1, expand=True)[1]).astype(int).eq(df2['image_position'])

# updates the value column with the images name
df2.loc[df2['idx']==True, 'value'] = df2['image_src']

#pivot to reverse the melt
df2.pivot(index=['handle','image_src','image_position'], columns='variable', values='value').reset_index()

variable    handle  image_src   image_position  image_src_1     image_src_2     image_src_3
0                a  img1                     1         img1             None    None
1                b  img2                     2         None             img2    None
2                c  img3                     3         None             None    img3
  • Related