I have string series that can contain strings of the form a[ - b[ - c]]
square brackets implying optionally present. For example it can either be in the form a
or a - b
or a - b - c
. I want to extract each element and convert it to a dataframe with 3 columns.
Now if the series has all 3 formats this gives the right result
s = pd.Series(['a', 'a - b', 'a - b - c'])
s.str.split(' - ', expand=True).fillna('')
# out
0 1 2
0 a
1 a b
2 a b c
However if it is just s = pd.Series(['a', 'a - b'])
then I just get
0 1
0 a
1 a b
expected output in this case would be
0 1 2
0 a
1 a b
I want 3 columns in the output regardless of what types of patterns are present in the series.
CodePudding user response:
Use DataFrame.reindex
:
s.str.split(' - ', expand=True).reindex(range(3), axis=1).astype(object).mask(lambda x: x.isna(), None)
Or:
s.str.split(' - ', expand=True).reindex(range(3), axis=1).fillna('')