Home > Back-end >  How do you split irregular patterned string into columns for dataframes using pandas?
How do you split irregular patterned string into columns for dataframes using pandas?

Time:10-25

I did my due diligence on figuring this out, but am still stuck. I'm struggling to split irregular patterned str (i.e. text, float, int string with irregular number of spaces in between).

My goal is to split the 'Item_Description' column into 2 columns - 'Product Size' (i.e. "4.1 OUNCE"), 'Pack Size' (i.e. "1 PK") - please see my attempt below and my screenshot.

When I run the code, nothing happens. Also, since the number of spaces are all different per item, I had no luck in creating new df columns with the split; kept getting column errors.

Really appreciate your help!

import pandas as pd
import re
import csv
import io
from IPython.display import display
from google.colab import files
uploaded = files.upload()
   
df = pd.read_csv(io.BytesIO(uploaded["Total item_level.csv"]))
df.Item_Description.str.split(" ", expand=True)

enter image description here

CodePudding user response:

I think the following works (didn't try though),

df["Pack_Size"] = df.Item_Description.str.split().map(lambda x : x[-2] " " x[-1])
df["Product_Size"] = df.Item_Description.str.split().map(lambda x : x[-4] " " x[-3])
  • Related