Home > Software engineering >  Pandas combine two columns with delimiters
Pandas combine two columns with delimiters

Time:04-22

I have two columns which have delimiters:

column1    column2
a,b,c,d    e,f,g,h
h,i        j,k
l,m,n      o,p,q
...         ...

I am trying to group them as ae, bf, cg, dh, hj, ik,etc.

I can split the delimiters but I do not know how to combine these, any help would be appreciated.

saved_column_1 = df["column1"].str.split(',', expand=True)
saved_column_2 = df["column2"].str.split(',', expand=True)

CodePudding user response:

IIUC, you could use:

(df.apply(lambda c: c.str.split('[,.]')) # or split(',') if the '.' is a typo
   .explode(list(df.columns))
   .agg(''.join, axis=1)
)

output:

0    ae
0    bf
0    cg
0    dh
1    hj
1    ik
2    lo
2    mp
2    nq
dtype: object

For a second level of grouping:

(df.apply(lambda c: c.str.split('[,.]'))
   .explode(list(df.columns))
   .agg(''.join, axis=1)
   .groupby(level=0).agg(','.join)  # this line is added
)

output:

0    ae,bf,cg,dh
1          hj,ik
2       lo,mp,nq
dtype: object

CodePudding user response:

Use nested list comprehension:

df['new'] = [','.join(''.join(x) 
             for x in list(zip(a.split(','), b.split(','))))
            for a, b in zip(df['column1'], df['column2'])]
print (df)
   column1  column2          new
0  a,b,c,d  e,f,g,h  ae,bf,cg,dh
1      h,i      j,k        hj,ik
2    l,m,n    o,p.q      lo,mp,nq

If need Series:

L = [''.join(x)  for a, b in zip(df['column1'], df['column2'])
     for x in list(zip(a.split(','),b.split(',')))]

s = pd.Series(L)
print (s)
0    ae
1    bf
2    cg
3    dh
4    hj
5    ik
6    lo
7    mp
8    nq
dtype: object
  • Related