I have a dataframe which has several columns with values as dictionaries. These columns also contain some None (not sure if it's a string or just missing values).
df = pd.DataFrame([[{0: 300}, {0: 157}, {0: 456}],[{0: 298}, None, {0: 498}],[None, None, {0: 987}]], columns=['col1', 'col2', 'col3'])
All dictionaries have key = 0, values = number from 100 to 10000.
I need to loop through all columns and rows and extract only the value. Preferably, i will overwrite the columns with only the values
So, end result should look like this:
df = pd.DataFrame([[300, 157, 456],[298, None, 498],[None, None, 987]], columns=['col1', 'col2', 'col3'])
The number is actually an ID which I will use later on for a "vlookup" into another dataframe.
I tried with lambda functions:
df['col1'] = df['col1'].apply(lambda x: x.values() if x is not None else x)
I did manage to extract the values. Issue is the type of these values is recognized as a dictionary value (they look like this when i print them: (300)).
I need them as integers. I tried chaining an astype(int) but i do get an error (something like, you can't do that on a dictionary's value) Any thoughts?
CodePudding user response:
Use DataFrame.applymap
for processing all columns with get first values of dict values:
df = df.applymap(lambda x: list(x.values())[0] if isinstance(x, dict) else x)
print (df)
col1 col2 col3
0 300.0 157.0 456
1 298.0 NaN 498
2 NaN NaN 987
If need integers convert values to Int64
:
df = (df.applymap(lambda x: list(x.values())[0] if isinstance(x, dict) else x)
.astype('Int64'))
print (df)
col1 col2 col3
0 300 157 456
1 298 <NA> 498
2 <NA> <NA> 987