I have a dataframe df with a mix of data types (str, list values, floats):
Property Name 2234.1.1 2234.1.2.1 2234.1.3.1
0 Flash Point (°C) [-5] NaN NaN
1 Flash Point (°C) [Max] NaN NaN
2 Flash Point (°C) [nan] NaN NaN
3 Flash Point (°C) NaN NaN NaN
4 Flash Point (°C) NaN NaN NaN
5 Flash Point (°C) NaN NaN NaN
6 Flash Point (°C) NaN NaN [29]
7 Flash Point (°C) NaN NaN [nan]
8 Flash Point (°C) NaN NaN [nan]
9 Flash Point (°C) NaN [159] NaN
10 Flash Point (°C) NaN [nan] NaN
I want to convert all the list values to regular values.
So desired df:
Property Name 2234.1.1 2234.1.2.1 2234.1.3.1
0 Flash Point (°C) -5 NaN NaN
1 Flash Point (°C) Max NaN NaN
2 Flash Point (°C) NaN NaN NaN
3 Flash Point (°C) NaN NaN NaN
4 Flash Point (°C) NaN NaN NaN
5 Flash Point (°C) NaN NaN NaN
6 Flash Point (°C) NaN NaN 29
7 Flash Point (°C) NaN NaN NaN
8 Flash Point (°C) NaN NaN NaN
9 Flash Point (°C) NaN 159 NaN
10 Flash Point (°C) NaN NaN NaN
I've tried:
for i in df.columns[1:]:
df[i]=[''.join(map(str,l)) for l in df[i]]
But I get the following error:
TypeError: 'float' object is not iterable
I assume I shouldn't try looping over the df like that anyway.
Edit: adding code to reproduce the df:
{'Property Name': {0: 'Flash Point (°C)',
1: 'Flash Point (°C)',
2: 'Flash Point (°C)',
3: 'Flash Point (°C)',
4: 'Flash Point (°C)',
5: 'Flash Point (°C)',
6: 'Flash Point (°C)',
7: 'Flash Point (°C)',
8: 'Flash Point (°C)',
9: 'Flash Point (°C)',
10: 'Flash Point (°C)'},
'2234.1.1': {0: ['-5'],
1: ['Max'],
2: [nan],
3: nan,
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: nan,
10: nan},
'2234.1.2.1': {0: nan,
1: nan,
2: nan,
3: nan,
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: ['159'],
10: [nan]},
'2234.1.3.1': {0: nan,
1: nan,
2: nan,
3: nan,
4: nan,
5: nan,
6: ['29'],
7: [nan],
8: [nan],
9: nan,
10: nan}}
CodePudding user response:
The method that deals with it is explode
. But just calling explode won't works since columns don't have matching element counts. One way to circumvent that problem is to stack
the DataFrame, explode
and unstack
back:
out = df.stack().explode().unstack()
If the lists are singleton lists (as in your example), then another option is to call explode
in loop:
for c in df.columns:
df = df.explode(c)
Another option could be to use str
accessor and assign
in a dict comprehension:
out = df.assign(**{f'{c}':df[c].str[0] for c in df.columns.drop('Property Name')})
Output:
Property Name 2234.1.1 2234.1.2.1 2234.1.3.1
0 Flash Point (°C) -5 NaN NaN
1 Flash Point (°C) Max NaN NaN
2 Flash Point (°C) NaN NaN NaN
3 Flash Point (°C) NaN NaN NaN
4 Flash Point (°C) NaN NaN NaN
5 Flash Point (°C) NaN NaN NaN
6 Flash Point (°C) NaN NaN 29
7 Flash Point (°C) NaN NaN NaN
8 Flash Point (°C) NaN NaN NaN
9 Flash Point (°C) NaN 159 NaN
10 Flash Point (°C) NaN NaN NaN