I have two columns which have delimiters:
column1 column2
a,b,c,d e,f,g,h
h,i j,k
l,m,n o,p,q
... ...
I am trying to group them as ae, bf, cg, dh, hj, ik,etc.
I can split the delimiters but I do not know how to combine these, any help would be appreciated.
saved_column_1 = df["column1"].str.split(',', expand=True)
saved_column_2 = df["column2"].str.split(',', expand=True)
CodePudding user response:
IIUC, you could use:
(df.apply(lambda c: c.str.split('[,.]')) # or split(',') if the '.' is a typo
.explode(list(df.columns))
.agg(''.join, axis=1)
)
output:
0 ae
0 bf
0 cg
0 dh
1 hj
1 ik
2 lo
2 mp
2 nq
dtype: object
For a second level of grouping:
(df.apply(lambda c: c.str.split('[,.]'))
.explode(list(df.columns))
.agg(''.join, axis=1)
.groupby(level=0).agg(','.join) # this line is added
)
output:
0 ae,bf,cg,dh
1 hj,ik
2 lo,mp,nq
dtype: object
CodePudding user response:
Use nested list comprehension:
df['new'] = [','.join(''.join(x)
for x in list(zip(a.split(','), b.split(','))))
for a, b in zip(df['column1'], df['column2'])]
print (df)
column1 column2 new
0 a,b,c,d e,f,g,h ae,bf,cg,dh
1 h,i j,k hj,ik
2 l,m,n o,p.q lo,mp,nq
If need Series:
L = [''.join(x) for a, b in zip(df['column1'], df['column2'])
for x in list(zip(a.split(','),b.split(',')))]
s = pd.Series(L)
print (s)
0 ae
1 bf
2 cg
3 dh
4 hj
5 ik
6 lo
7 mp
8 nq
dtype: object