I have a column in a DF as below
| Column A |
| ab, bce, bc |
| bc, abcd, ab |
| ab, cd, abc |
and i want to create a new column that only takes the first sequence, as showed below
| Column A | Column B |
| ab, bce, bc | ab |
| bc, abcd, ab | bc |
| ab, cd, abc | ab |
I tried with this code but it only gives me the first letter of the first sequence, not the entire abbrevation
df.loc[:, 'ColumnB'] = df.ColumnA.map(lambda x: x[0])
CodePudding user response:
I guess the items in columnA are strings like e.g. 'ab, bce, bc', so just use split ;).
df.loc[:, 'ColumnB'] = df.ColumnA.map(lambda x: x.split(',')[0])
CodePudding user response:
You can alos try vectorised str method split and use integer indexing on the list to get the first element:
df['Column B'] = df['Column A'].str.split(',').str[0]
Should gives
Column A Column B
ab, bce, bc ab
bc, abcd, ab bc
ab, cd, abc ab
CodePudding user response:
You're close, you just need to convert strings to lists with pandas.Series.split
before the map :
df["Column B"]= df["Column A"].str.split(",").map(lambda x: x[0])
You can also use pandas.Series.get
:
df["Column B"]= df["Column A"].str.split(",").str.get(0)
Another option is list comprehension:
df["Column B"]= [el[0] for el in df["Column A"].str.split(",")]
# Output :
print(df)
Column A Column B
0 ab, bce, bc ab
1 bc, abcd, ab bc
2 ab, cd, abc ab
CodePudding user response:
So,the row is treated as string and you are getting the first index of string "ab,bce,bc".
You need to convert that to a list and then take the first element which will be "ab" now.
df.loc[:, 'ColumnB'] = df.ColumnA.map(lambda x: x.split(",")[0])
This creates "ColumnB" as you require.
Hope it helps!
CodePudding user response:
If you want the first chunk, don't split
. Instead extract
the initial non ,
characters. This will be more efficient:
df['Column B'] = df['Column A'].str.extract('([^,] )')
Output:
Column A Column B
0 ab, bce, bc ab
1 bc, abcd, ab bc
2 ab, cd, abc ab