I want to output the following table in pandas. I only have the description
column so far but I want to split on the comma and output the contents before the comma in the commondescrip
column.
I have the description column right now, I need the commondescrip column
description | commondescrip |
---|---|
00001 | 00001 |
00002 | 00002 |
00003,Area01 | 00003 |
00004 | 00004 |
00005,Area02 | 00005 |
I tried
splitword = df2["description"].str.split(",", n=1, expand = True)
df2["commondescrip"] = splitword[0]
but it gives me NaN for those rows that have Area.
How can I fix it so that I can achieve the above the table and split it to output before the comma?
CodePudding user response:
Here is one way to do it
df['description'].apply(lambda x: x.strip().split(',')[0])
0 00001
1 00002
2 00003
3 00004
4 00005
Name: description, dtype: object
CodePudding user response:
Don't split, this would require to handle several parts while you're only interested in one: remove or extract.
removing everything after the first comma:
df['commondescrip'] = df['description'].str.replace(',.*', '', regex=True)
or extracting everything before the first comma:
df['commondescrip'] = df['description'].str.extract('([^,] )')
output:
description commondescrip
0 00001 00001
1 00002 00002
2 00003,Area01 00003
3 00004 00004
4 00005,Area02 00005