Home > Software engineering >  Splitting Column with Delimiter Appearing more than Once
Splitting Column with Delimiter Appearing more than Once

Time:03-24

I have the following line in my code where I'm taking a string and splitting it based on a delimiter:

task_df[['Project','Section']] = task_df.Projects.str.split(": ",expand=True)
#sample Projects = Rob's Project: Untitled Section

But I'm running into issues whenever someone adds a Project or Section name that also contains my delimiter ex. Project X: Section: Rob

error: ValueError: Columns must be same length as key

NOTE: Sometimes the duplicate : will be in the project, but most times it's in the Section Name

How would I account for this in my code? Is there any way to cleanly avoid this from being an error? If not, how can I make it just remove those that would cause the error?

CodePudding user response:

IIUC, you need only the first and second part, if the same sep exists more than once, they should be joined by the same sep. Therefore:

  task_df['Project'] = task_df['Projects'].str.split(": ").str[0]
  task_df['Section'] = task_df['Projects'].str.split(": ").str[1:].map(lambda x: ": ".join(x))

For this dataset:

                          Projects
0  Rob's Project: Untitled Section
1         Project X:  Section: Rob

This is the output:

                          Projects        Project           Section
0  Rob's Project: Untitled Section  Rob's Project  Untitled Section
1         Project X:  Section: Rob      Project X      Section: Rob
  • Related