Home > other >  Pandas Function to Split multi-line text column into multiple columns
Pandas Function to Split multi-line text column into multiple columns

Time:09-02

I have a column (stud_info) in the below format

stud_info = """Name: Mark
Address: 
PHX, AZ
Hobbies: 
1. Football
2. Programming
3. Squash"""

source data

The column (stud_info) from raw data is stud_info which contains data as multiline text. I need to split it into 3 columns (Name, Address, and Hobbies). For a simple split, we can do it via lambda functions but this is a multiline split and the column names are also a part of the data. (i.e. the text Name, Address, and Hobbies should not be a part of the columns). The final columns should look like

final data

Please suggest a way to do it using pandas.

CodePudding user response:

Given:

df = pd.DataFrame({'stud_info': {0: 'Name: Mark\nAddress: \nPHX, AZ\nHobbies: \n1. Football\n2. Programming\n3. Squash'}})

We can define a Regex Expression for your particular formatting, and use the pd.Series.str.extract method to break the groups into different columns. For an explanation of the pattern see enter image description here

  • Related