I am working on a large dataframe with multiple columns. However, some of columns have data in the form of arrays with in arrays (single value). I need to convert the dataframe columns with only cell values i.e., without the array element style. I have tried flatten, squeeze in different ways, but could not get the output in the way I am looking. Following code reproduces the data format I am working at present:
import pandas as pd
a = [[[10]],[[20]],[[30]],[[40]]]
b=[[50],[60],[70],[80]]
c=[90,100,110,120]
df = pd.DataFrame(list(zip(a,b,c)),columns=['a','b','c'])
print(df)
The output of the above is:
a b c
0 [[10]] [50] 90
1 [[20]] [60] 100
2 [[30]] [70] 110
3 [[40]] [80] 120
However, I am looking to get the output as below:
a b c
0 10 50 90
1 20 60 100
2 30 70 110
3 40 80 120
It would really help, if you could suggest how to approach this problem.
CodePudding user response:
Maybe not the best solution. But it works.
def ravel_series(s):
try:
return np.concatenate(s).ravel()
except ValueError:
return s
df.apply(ravel_series)
CodePudding user response:
You can try this,
Code:
def clean(el):
if any(isinstance(i, list) for i in el):
return el[0][0]
elif isinstance(row, list):
return el[0]
df['a'] = df.a.apply(clean)
df['b'] = df.b.apply(clean)
print(df)
Output:
a b c
0 10 50 90
1 20 60 100
2 30 70 110
3 40 80 120
CodePudding user response:
You can unnest the list with the str
locator:
df['a'].str[0].str[0]
output:
0 10
1 20
2 30
3 40
Name: a, dtype: int64
To automatize things a bit, you can use a recursive function:
def unnest(x):
from pandas.api.types import is_numeric_dtype
if is_numeric_dtype(x):
return x
else:
return unnest(x.str[0])
df2 = df.apply(unnest)
variant using the first item of each Series to determine the nesting level:
def unnest(x):
from pandas.api.types import is_numeric_dtype
if len(x)>0 and isinstance(x.iloc[0], list):
return unnest(x.str[0])
else:
return x
df2 = df.apply(unnest)
output:
a b c
0 10 50 90
1 20 60 100
2 30 70 110
3 40 80 120
arbitrary nesting
If you had an arbitrary nesting for each cell, you could use the same logic per element:
def unnest(x):
if isinstance(x, list) and len(x)>0:
return unnest(x[0])
else:
return x
df2 = df.applymap(unnest)