I have a df with the following format:
id json_1 json_2 json_3
1 {a:b} {a:c} {c:d}
2 {a:b} {b:c} null
3 {a:c} {c:d} {a:g}
I want to create a new column which concatenates (i.e., takes union) json_1, json_2, and json_3 columns.
Desired output:
id json_1 json_2 json_3 final_json
1 {a:b} {a:c} {c:d} {{a:b}, {a:c}, {c:d}}
2 {a:b} {b:c} null {{a:b}, {b:c}}
3 {a:c} {c:d} {a:g} {{a:c}, {c:d}, {a:g}}
CodePudding user response:
IIUC use:
df['final_json'] = df.filter(like='json').apply(lambda x: [y for y in x if pd.notna(y)], axis=1)
CodePudding user response:
Depending on the type of data and additional requirements, this should do the work
df['final_json'] = df[['json_1', 'json_2', 'json_3']].apply(lambda x: set(x) - set(['null']), axis=1)
[Out]:
id json_1 json_2 json_3 final_json
0 1 {a:b} {a:c} {c:d} {{c:d}, {a:c}, {a:b}}
1 2 {a:b} {b:c} null {{b:c}, {a:b}}
2 3 {a:c} {c:d} {a:g} {{a:g}, {c:d}, {a:c}}