Good Day everyone!
I need help with alternatives or a workaround for explode() in pandas 0.19.0 I have this csv files
item CODE
0 apple REDGRNYLW
1 strawberry REDWHT
2 corn YLWREDPRLWHTPNK
I need to get this result
item CODE
1 apple RED
2 apple GRN
3 apple YLW
4 strawberry RED
5 strawberry WHT
6 corn YLW
7 corn RED
8 corn PRL
9 corn WHT
10 corn PNK
I managed to get the result using pandas 1.3.3, here is what I did
import pandas as pd
filename = r'W:\plant_CODE.csv'
df2 = pd.read_csv(filename)
def split_every_3_char(string):
return [string[i:i 3] for i in range(0, len(string), 3)]
df2.columns = ['item', 'CODE']
df_splitted = (df2.set_index(df2.columns.drop('CODE', 1).tolist())
.CODE.apply(lambda x: split_every_3_char(x))
.explode()
.to_frame()
.reset_index()
)
print(df_splitted)
Unfortunately, I just realized that I'm limited to pandas 0.19.0
and explode()
isn't yet available.
Traceback (most recent call last):
File "<string>", line 69, in <module>
File "lib\site-packages\pandas\core\generic.py", line 2744, in __getattr__
AttributeError: 'Series' object has no attribute 'explode'
I would appreciate any solution or workaround. Thank you!
CodePudding user response:
Convert ouput of function to Series
and use DataFrame.stack
:
df_splitted = (df2.set_index(df2.columns.drop('CODE', 1).tolist())
.CODE.apply(lambda x: pd.Series(split_every_3_char(x)))
.stack()
.reset_index(-1, drop=True)
.reset_index(name='CODE')
)
print(df_splitted)
item CODE
0 apple RED
1 apple GRN
2 apple YLW
3 strawberry RED
4 strawberry WHT
5 corn YLW
6 corn RED
7 corn PRL
8 corn WHT
9 corn PNK
CodePudding user response:
What about crafting a Series from a list comprehension and joining?
l = [[i, x[3*i:3*(i 1)]] for i,x in zip(df.index, df['CODE'])
for i in range(len(x)//3)]
s = pd.DataFrame(l, columns=['index', 'CODE']).set_index('index')['CODE']
df[['item']].join(s)
output:
item CODE
0 apple RED
0 apple RED
0 apple YLW
1 strawberry GRN
1 strawberry WHT
1 strawberry RED
2 corn YLW
2 corn PRL