Home > Software engineering >  Pandas 0.19.0 explode() workaround
Pandas 0.19.0 explode() workaround

Time:03-30

Good Day everyone!

I need help with alternatives or a workaround for explode() in pandas 0.19.0 I have this csv files

  item        CODE
0 apple       REDGRNYLW
1 strawberry  REDWHT
2 corn        YLWREDPRLWHTPNK

I need to get this result

  item        CODE
1 apple       RED
2 apple       GRN
3 apple       YLW
4 strawberry  RED
5 strawberry  WHT
6 corn        YLW
7 corn        RED
8 corn        PRL
9 corn        WHT
10 corn       PNK

I managed to get the result using pandas 1.3.3, here is what I did

import pandas as pd

filename = r'W:\plant_CODE.csv'

df2 = pd.read_csv(filename)

def split_every_3_char(string):
    return [string[i:i 3] for i in range(0, len(string), 3)]

df2.columns = ['item', 'CODE']
df_splitted = (df2.set_index(df2.columns.drop('CODE', 1).tolist())
    .CODE.apply(lambda x: split_every_3_char(x))
    .explode()
    .to_frame()
    .reset_index()
)

print(df_splitted)

Unfortunately, I just realized that I'm limited to pandas 0.19.0 and explode() isn't yet available.

Traceback (most recent call last):
   File "<string>", line 69, in <module>
   File "lib\site-packages\pandas\core\generic.py", line 2744, in __getattr__
 AttributeError: 'Series' object has no attribute 'explode'

I would appreciate any solution or workaround. Thank you!

csv_file

CodePudding user response:

Convert ouput of function to Series and use DataFrame.stack:

df_splitted = (df2.set_index(df2.columns.drop('CODE', 1).tolist())
    .CODE.apply(lambda x: pd.Series(split_every_3_char(x)))
    .stack()
    .reset_index(-1, drop=True)
    .reset_index(name='CODE')
)

print(df_splitted)
         item CODE
0       apple  RED
1       apple  GRN
2       apple  YLW
3  strawberry  RED
4  strawberry  WHT
5        corn  YLW
6        corn  RED
7        corn  PRL
8        corn  WHT
9        corn  PNK

CodePudding user response:

What about crafting a Series from a list comprehension and joining?

l = [[i, x[3*i:3*(i 1)]] for i,x in zip(df.index, df['CODE'])
      for i in range(len(x)//3)]
s = pd.DataFrame(l, columns=['index', 'CODE']).set_index('index')['CODE']
df[['item']].join(s)

output:

         item CODE
0       apple  RED
0       apple  RED
0       apple  YLW
1  strawberry  GRN
1  strawberry  WHT
1  strawberry  RED
2        corn  YLW
2        corn  PRL
  • Related