I have an already set Pandas Dataframe that contains an image path and I need to add a column to it where each cell should contain a multidimensional array (representing that image).
Here an example:
import pandas as pd
import numpy as np
df = pd.DataFrame(data=[["test.png","dog.png"],[3,4]], columns=["path","B"])
# creating a new empty column
df = df.assign(image=np.nan)
image = # reading image path from row 1
df.iloc[1, df.columns.get_loc("image")] = image
but I keep obtaining the error: ValueError: Must have equal len keys and value when setting with an ndarray
.
How can I fix that? I've already tried to follow this but it didn't work for me.
Just to be clear, in my real dataframe the image field on n-th row depends on the value of path on n-th row.
Expected result:
path B image
0 "test.png" 2 NaN
1 "dog.png" 4 [[1,2,...], [255,255,...], ...]
CodePudding user response:
Use PIL.Image
module to get image object convertible to an array:
from PIL import Image
df = pd.DataFrame({'path': ["stackoverflow-icon.png", "../images/wall.jpg"],
'B': [3, 4]})
df['image'] = df.apply(lambda x: np.asarray(Image.open(x['path'])), axis=1)
print(df)
Sample output:
path ... image
0 stackoverflow-icon.png ... [[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0...
1 ../images/wall.jpg ... [[[81, 127, 213], [87, 132, 213], [83, 127, 20...
[2 rows x 3 columns]