I'm using python 3.7 I have a pandas data frame with three text columns,name,email & section. The sample data looks like
Name Email Section
abc [email protected] purchase
cde - drawing
lmn-pqr None -
Hyphen are there in between two words in all of the three columns. I would like to join three columns with "_" as separator and create a new column group ignoring None or -. My combined outcome will look like
Name Email Section Group
abc [email protected] purchase [email protected]_purchase
cde - drawing cde_drawing
lmn-pqr None - lmn-pqr
I'm not sure about the python code. Can you please help me?
CodePudding user response:
You can use str.cat
that gets rid of null values:
df.mask(df.isin(['-', None])).apply(lambda r: r.str.cat(sep='_'), axis=1)
or, manually:
df['Group'] = df.apply(lambda r: '_'.join([x for x in r.replace('-', pd.NA).dropna()]),
axis=1)
output:
Name Email Section Group
0 abc [email protected] purchase [email protected]_purchase
1 cde - drawing cde_drawing
2 lmn-pqr None - lmn-pqr
CodePudding user response:
You can try replace -
with None
then filter it out when join
df['Group'] = df.replace({'-': None}).apply(lambda row: '_'.join(filter(None, row)), axis=1)
print(df)
Name Email Section Group
0 abc [email protected] purchase [email protected]_purchase
1 cde - drawing cde_drawing
2 lmn-pqr None - lmn-pqr
CodePudding user response:
df['Group'] = df.apply(lambda x: '-'.join([x['Name'], x['Email'], x['Section']))
x is a Series.