Home > Blockchain >  Pandas - string split into multiple columns with variable number of delimited values into 3 columns
Pandas - string split into multiple columns with variable number of delimited values into 3 columns

Time:10-14

I have string series that can contain strings of the form a[ - b[ - c]] square brackets implying optionally present. For example it can either be in the form a or a - b or a - b - c. I want to extract each element and convert it to a dataframe with 3 columns.

Now if the series has all 3 formats this gives the right result

s = pd.Series(['a', 'a - b', 'a - b - c'])
s.str.split(' - ', expand=True).fillna('')
# out
   0  1  2
0  a
1  a  b
2  a  b  c

However if it is just s = pd.Series(['a', 'a - b']) then I just get

   0  1
0  a
1  a  b

expected output in this case would be

   0  1 2
0  a
1  a  b   

I want 3 columns in the output regardless of what types of patterns are present in the series.

CodePudding user response:

Use DataFrame.reindex:

s.str.split(' - ', expand=True).reindex(range(3), axis=1).astype(object).mask(lambda x: x.isna(), None)

Or:

s.str.split(' - ', expand=True).reindex(range(3), axis=1).fillna('')
  • Related