Home > front end >  How to use delimiter on column where some instances do not have the delimiter, python
How to use delimiter on column where some instances do not have the delimiter, python

Time:05-23

I have a column called sub, that I want to split into sub_1 and sub_2.

sub example looks like this:

banana/apple, 
banana,
apple

where some records have the delimiter '/' and some don't.

What I am trying to do is: if there is a delimiter like in the first example above, split so: sub_1 -> banana and sub_2 -> apple

if there is no delimiter like in the second two examples, then it would look like: sub_1 -> banana sub_1 -> apple

I tried this code:

df[['sub_1', 'sub_2']] = df['sub'].str.split('/', expand=True)

However I get this error:

ValueError: Columns must be same length as key

which I am guessing is because how some columns do not have a delimiter, wondering if there is a quick way to default that issue to the first column if anyone here has run into this issue before.

Thanks for any direction.

CodePudding user response:

Assuming your dataframe looks like this:

>>> df
            sub
0  banana/apple
1        banana
2         apple

Then you just need to split with expand:

>>> df[["sub_1", "sub_2"]] = df["sub"].str.split("/", expand=True)

>>> df
            sub   sub_1  sub_2
0  banana/apple  banana  apple
1        banana  banana   None
2         apple   apple   None

Or, using pd.Series, if you want NaNs:

>>> df["sub"].str.split("/").apply(pd.Series)
            sub   sub_1  sub_2
0  banana/apple  banana  apple
1        banana  banana    NaN
2         apple   apple    NaN

CodePudding user response:

Try this:

df = pd.DataFrame({"sub": ["banana/apple", "banana", "apple"]})

res = df.join(df.pop("sub").str.split("/", expand=True))
res.columns = ["sub_1", "sub_2"]

print(res)

    sub_1  sub_2
0  banana  apple
1  banana   None
2   apple   None
  • Related