I search a "global" solution to extract, from a dataframe's column, a list of "key":"value" to have each "key" as Column name and "value" as Value:
Before:
id, severity, user, events, city
1,Low,test1,[{'type': 'AAA', 'timestamp': 1653135398011, 'agent': None,...}], Athens
2,Medium,test2,[{'type': 'BBB', 'timestamp': 1653135398012, 'agent': STIX,...}], Buffalo
3,,test3,[{'type': 'CCC', 'timestamp': 1653135398013, 'agent': ACQ,...}], Carson
4,Low,test4,[{'type': 'DDD', 'timestamp': 1653135398014, 'agent': VTC,...}], Detroit
After:
id, severity, user, type, timestamp, agent,..., city
1,Low,test1,AAA,1653135398011,None, ..., Athens
2,Medium,test2,BBB,1653135398012,STIX, ..., Buffalo
3,,test3,CCC,1653135398013,ACQ,..., Carson
4,Low,test4,DDD,1653135398014,VTC,..., Detroit
On stackoverflow some solution extract 2 or 3 fields by their names, but if we don't know list content, how extract everything? I think lambda function and/or regex will do the job but my skills are too bad...
Thanks for your help
CodePudding user response:
You can try something like this
events_df = pd.DataFrame()
for row in df["events"]:
events_df = events_df.append(row[0], ignore_index=True)
pd.concat([df, events_df], axis=1).drop(["events"], axis=1)
I got it working with a DataFrame
that looks like this,
id severity user events \
0 1 Low test1 [{'type': 'AAA', 'timestamp': 1653135398011, '...
1 2 Medium test2 [{'type': 'BBB', 'timestamp': 1653135398012, '...
2 3 NaN test3 [{'type': 'CCC', 'timestamp': 1653135398013, '...
3 4 Low test4 [{'type': 'DDD', 'timestamp': 1653135398014, '...
city
0 Athens
1 Buffalo
2 Carson
3 Detroi
Edit:
Thank @Thyebri for the suggestion. It's possible to complete the same without using a loop. Though I am not qualified to say if it's more or less efficient.
events_df = pd.DataFrame(iter(df["events"].apply(lambda ls: ls[0])))
pd.concat([df, events_df], axis=1).drop(["events"], axis=1)
CodePudding user response:
Here is a way to do what your question asks:
from functools import reduce
df = pd.concat([df.drop(columns=['events', 'city']), pd.DataFrame.from_records(reduce(lambda a, b: a b, df.events)), df['city']], axis=1)
Explanation:
- Using
functools.reduce()
, create a list of the dictionary objects in theevents
column - Usd pd.concat() to glue together (1) the columns preceding events, (2) a dataframe of columns in the dictionaries found in the values in the
events
column created using from_records(), and (3) the column(s) to the right ofevents
(in this case, justcity
)
Full test code:
import pandas as pd
df = pd.DataFrame(columns=['id','severity','user','events','city'], data=[
[1,'Low','test1',[{'type': 'AAA', 'timestamp': 1653135398011, 'agent': None}],'Athens'],
[2,'Medium','test2',[{'type': 'BBB', 'timestamp': 1653135398012, 'agent': 'STIX'}],'Buffalo'],
[3,None,'test3',[{'type': 'CCC', 'timestamp': 1653135398013, 'agent': 'ACQ'}], 'Carson'],
[4,'Low','test4',[{'type': 'DDD', 'timestamp': 1653135398014, 'agent': 'VTC'}], 'Detroit']])
print('Input dataframe:')
print(df)
from functools import reduce
df = pd.concat([df.drop(columns=['events', 'city']), pd.DataFrame.from_records(reduce(lambda a, b: a b, df.events)), df['city']], axis=1)
print('\nResult:')
print(df)
Output:
Input dataframe:
id severity user events city
0 1 Low test1 [{'type': 'AAA', 'timestamp': 1653135398011, '... Athens
1 2 Medium test2 [{'type': 'BBB', 'timestamp': 1653135398012, '... Buffalo
2 3 None test3 [{'type': 'CCC', 'timestamp': 1653135398013, '... Carson
3 4 Low test4 [{'type': 'DDD', 'timestamp': 1653135398014, '... Detroit
Result:
id severity user type timestamp agent city
0 1 Low test1 AAA 1653135398011 None Athens
1 2 Medium test2 BBB 1653135398012 STIX Buffalo
2 3 None test3 CCC 1653135398013 ACQ Carson
3 4 Low test4 DDD 1653135398014 VTC Detroit
NOTE: It was necessary to make slight changes to the dataframe shown in the question, such as putting quotes around the 'agent' values (STIX, ACQ, VTC).