I have a column called sub, that I want to split into sub_1 and sub_2.
sub example looks like this:
banana/apple,
banana,
apple
where some records have the delimiter '/' and some don't.
What I am trying to do is: if there is a delimiter like in the first example above, split so: sub_1 -> banana and sub_2 -> apple
if there is no delimiter like in the second two examples, then it would look like: sub_1 -> banana sub_1 -> apple
I tried this code:
df[['sub_1', 'sub_2']] = df['sub'].str.split('/', expand=True)
However I get this error:
ValueError: Columns must be same length as key
which I am guessing is because how some columns do not have a delimiter, wondering if there is a quick way to default that issue to the first column if anyone here has run into this issue before.
Thanks for any direction.
CodePudding user response:
Assuming your dataframe looks like this:
>>> df
sub
0 banana/apple
1 banana
2 apple
Then you just need to split with expand
:
>>> df[["sub_1", "sub_2"]] = df["sub"].str.split("/", expand=True)
>>> df
sub sub_1 sub_2
0 banana/apple banana apple
1 banana banana None
2 apple apple None
Or, using pd.Series
, if you want NaN
s:
>>> df["sub"].str.split("/").apply(pd.Series)
sub sub_1 sub_2
0 banana/apple banana apple
1 banana banana NaN
2 apple apple NaN
CodePudding user response:
Try this:
df = pd.DataFrame({"sub": ["banana/apple", "banana", "apple"]})
res = df.join(df.pop("sub").str.split("/", expand=True))
res.columns = ["sub_1", "sub_2"]
print(res)
sub_1 sub_2
0 banana apple
1 banana None
2 apple None