I have the following JSON file:
{
"IMG1.tif": {
"0": [
100,
192,
[
129,
42,
32
]
],
"1": [
299,
208,
[
133,
42,
24
]
]
},
"IMG2.tif": {
"0": [
100,
207,
[
128,
41,
34
]
],
"1": [
299,
192,
[
81,
25,
26
]
]
}
}
I'm reading into a dataframe with df = pd.read_json('img_data.json', orient = 'columns')
. I find that this is a clear and logical way to store the information I want to store, but I want to access each of the values for each column and be able to iterate across/work with them.
For example, in this case, these values are coordinates. I'd like to, in the most convenient and natural way possible, be able to access the x, y or z axis value(s) for every coordinate in each column, i.e. (something like):
>>> df["IMG1.tif"][0,:]
0 100
1 299
or even filter across the whole dataframe:
>>> get_y_values(df)
IMG1.tif IMG2.tif
0 192 207
1 208 192
I also accept suggestions on how to change the way the data is stored (it may be necessary), but I don't think I can store values outside lists because of the way they're obtained - meaning that, as you can see,
"IMG.1.tif": { "0": [100, 192, [129, 42, 32]] ...
each 3-set of coordinates in the dataframe is shown inside a list.
In case some of you are curious or confused, z axis values are just RGB values. At some point I will need to transform them into grayscale inside the database, too:
>>> do_grayscale(df) # example values
IMG1.tif IMG2.tif
0 [100, 192, 61] [100, 207, 87]
1 [299, 208, 122] [299, 192, 94]
Added: one of the alternative ways to have the original data stored, albeit with sacrifices in the original code, would be something like this:
x y z image_name
0 100 192 [129, 42, 32] IMG1.tif
1 299 208 [133, 42, 24] IMG1.tif
2 100 207 [128, 41, 34] IMG2.tif
3 299 192 [81, 25, 26] IMG2.tif
CodePudding user response:
I'd suggest building a dataframe with multiindex columns:
df = df.T # first transpose your df
df_out = pd.concat([
pd.DataFrame(df[col].tolist(), index=df.index,
columns=pd.MultiIndex.from_tuples(zip([col]*3, ["x", "y", "z"]))
) for col in df.columns
], axis=1
)
This will give you the following df:
0 1
x y z x y z
IMG1.tif 100 192 [129, 42, 32] 299 208 [133, 42, 24]
IMG2.tif 100 207 [128, 41, 34] 299 192 [81, 25, 26]
You can then access any element of your frame with the loc
method. For instance:
df_out.loc['IMG1.tif', (0, "y") # returns 192
df_out.loc['IMG1.tif', ([0, 1], "x")] # returns a series with 100 and 299
df_out.loc[:, ([0, 1], "y")] # will get you all y values (granted you have only 0 and 1... edit accordingly)
Edit: if 0 and 1 are not relevant as index and you want the structure of your last example:
df = df.stack().reset_index(level=1)
df_out = pd.concat([
pd.DataFrame(sub_df[0].tolist(), columns=["x", "y", "z"]).assign(image_name=img)
for img, sub_df in df.groupby('level_1')
]).reset_index(drop=True)
Output:
x y z image_name
0 100 192 [129, 42, 32] IMG1.tif
1 299 208 [133, 42, 24] IMG1.tif
2 100 207 [128, 41, 34] IMG2.tif
3 299 192 [81, 25, 26] IMG2.tif