Let's say we have the following dataframe
handle image_src image_position image_src_1 image_src_2 image_src_3
0 a img1 1 None None None
1 b img2 2 None None None
2 c img3 3 None None None
How can we convert this datafame into this?
handle image_src image_position image_src_1 image_src_2 image_src_3
0 a img1 1 img1 None None
1 b img2 2 None img2 None
2 c img3 3 None None img3
Basically I want to fill the image_src_{image_position}
column based on the value of image_position
of each row.
I come up with a solution which involves manually iterating over the whole dataframe, like this
data = [
{"handle": "a","image_src": "img1","image_position": 1,"image_src_1": None,"image_src_2": None,"image_src_3": None},
{"handle": "b","image_src": "img2","image_position": 2,"image_src_1": None,"image_src_2": None,"image_src_3": None,
{"handle": "c","image_src": "img3","image_position": 3,"image_src_1": None,"image_src_2": None,"image_src_3": None}
]
df = pd.DataFrame(data)
for index in range(0, len(df)):
row = df.iloc[index]
position = row["image_position"].astype("int64")
df.loc[index, f"image_src_{position}"] = row["image_src"]
But as iterating over the whole dataframe is a bad thing, how can I imporove this?
CodePudding user response:
First that came to my mind was .pivot()
function. Though this might not be the exact thing you wanted, — the original image_src
column would be dropped, the column naming would slightly differ and nan
might need handling, — it's rather a short solution.
df.pivot(index=['handle', 'image_position'], # to keep these columns intact
columns='image_src',
values='image_src'
).add_prefix('image_src_'
).reset_index()
What I got:
handle | image_position | image_src_img1 | image_src_img2 | image_src_img3 | |
---|---|---|---|---|---|
0 | a | 1 | img1 | nan | nan |
1 | b | 2 | nan | img2 | nan |
2 | c | 3 | nan | nan | img3 |
Another take is to use image_position
column as an index for the new columns, thus it would be dropped too. Also may replace nan
with None
:
import numpy as np
df.pivot(index='handle',
columns='image_position',
values='image_src'
).add_prefix('image_src_'
).replace({np.nan: None}
).reset_index()
Goes like this:
handle | image_src_1 | image_src_2 | image_src_3 | |
---|---|---|---|---|
0 | a | img1 | None | None |
1 | b | None | img2 | None |
2 | c | None | None | img3 |
CodePudding user response:
here is one way to do it, assuming that the suffix to the image name is not associated to the column name suffix. Instead its the image position to decide the column
#melt the dataframe
df2=df.melt(id_vars=['handle','image_src','image_position'])
# identify the rows where the image_position matches the column suffix
df2['idx']=(df2['variable'].str.rsplit('_',1, expand=True)[1]).astype(int).eq(df2['image_position'])
# updates the value column with the images name
df2.loc[df2['idx']==True, 'value'] = df2['image_src']
#pivot to reverse the melt
df2.pivot(index=['handle','image_src','image_position'], columns='variable', values='value').reset_index()
variable handle image_src image_position image_src_1 image_src_2 image_src_3
0 a img1 1 img1 None None
1 b img2 2 None img2 None
2 c img3 3 None None img3