In Python, I have a dataframe that has json variables.
For example, if keys and values of a column are like:
{'a': 3, 'b': 2, 'c': 1}
{'d': 287, 'e': 56, 'f': 17, 'g': 9}
{'h': 22, 'i': 15, 'a': 7}
{'g': 17, 'd': 15, 'e': 15, 'f': 7, 'h': 15}
I want to append columns to be like
key_1 value_1 key_2 value_2 key_3 value_3 key_4 value_4 key_5 value_5
a 3 b 2 c 1
d 287 e 56 f 17 g 9
h 22 i 15 a 7
g 17 d 15 e 15 f 7 h 15
i.e.,
One pair generates two columns. (key_, value_)
Each row has different numbers of key-value pairs.
Though keys are the same, the values of the keys can be different.
Though keys are different, the values of the keys can be the same.
I need to make only up to 600 columns (300 pairs * 2).
CodePudding user response:
Try:
# if type of the column is string, convert it to python object first:
#from ast import literal_eval
#df["column"] = df["column"].apply(literal_eval)
x = (
df["column"]
.apply(
lambda x: {
f"{k}{i}": v
for i, t in enumerate(x.items(), 1)
for k, v in zip(["key_", "value_"], t)
},
)
.to_list()
)
print(pd.DataFrame(x).fillna(""))
Prints:
key_1 value_1 key_2 value_2 key_3 value_3 key_4 value_4 key_5 value_5
0 a 3 b 2 c 1
1 d 287 e 56 f 17 g 9.0
2 h 22 i 15 a 7
3 g 17 d 15 e 15 f 7.0 h 15.0
Dataframe used:
column
0 {'a': 3, 'b': 2, 'c': 1}
1 {'d': 287, 'e': 56, 'f': 17, 'g': 9}
2 {'h': 22, 'i': 15, 'a': 7}
3 {'g': 17, 'd': 15, 'e': 15, 'f': 7, 'h': 15}
CodePudding user response:
For fun (and as a "challenge", I would have used something similar to @Andrej's approach in real life that is more concise and efficient), here is a pure pandas solution:
s = df['col'].apply(list).explode().rename('key')
idx = pd.MultiIndex.from_arrays([s.index, s])
(pd
.json_normalize(df['col']).stack()
.to_frame(name='value')
.reindex(idx)
.assign(pos=lambda d: d.groupby(level=0).cumcount().add(1))
.reset_index()
.pivot(index='level_0', columns='pos')
.sort_index(level=1, axis=1, sort_remaining=False)
.pipe(lambda d: d.set_axis(d.columns.map(lambda x: f'{x[0]}_{x[1]}'), axis=1))
)
output:
key_1 value_1 key_2 value_2 key_3 value_3 key_4 value_4 key_5 value_5
level_0
0 a 3.0 b 2.0 c 1.0 NaN NaN NaN NaN
1 d 287.0 e 56.0 f 17.0 g 9.0 NaN NaN
2 h 22.0 i 15.0 a 7.0 NaN NaN NaN NaN
3 g 17.0 d 15.0 e 15.0 f 7.0 h 15.0