Home > Back-end >  How to split a column containing a values using pandas
How to split a column containing a values using pandas

Time:03-23

I have a data column containg the values as shown in the image and i want to split them into different columns . Data type of column B is object/string

Column A Column B
12 ("apple",123,"six",03-10-2020)
10 ("banana","two")
5 ("mango",156,02-10-2020)

and i want to split it into different columns

enter image description here

CodePudding user response:

@nik I see one additional challenge here. From you data frame, I see no pattern in you "Column B". So I suggest you to write a function to arrange the tuples in a standard format as show below (For help check this link)

Old Col B New Col B
("banana","two") ("banana", None,"two", None)
("mango",156,02-10-2020) ("mango",156, None,02-10-2020)

After your data is in acceptable standard format, you use pandas inbuilt function to split your data. Example is below

df['New Col B'].str.split(',', expand=True)
0 1 2 3
"banana" None "two" None
"mango" 156 None 02-10-2020

CodePudding user response:

It would be easy if Column B has fixed four values for 'fruit', 'code', 'quantity', and 'date'. However, the number of values are not fixed here, which makes hard to solve. I assume that there are three patterns as you exampled and solve the problem, as follows:

import pandas as pd

df = pd.DataFrame({
    'Column A': [12, 10, 5],
    'Column B': ['("apple",123,"six",03-10-2020)', '("banana","two")', '("mango",156,02-10-2020)']
})

fruits = []
codes = []
quantities = []
dates = []
for i, row in df.iterrows():
    values = row['Column B'][1:-1].split(',')
    if len(values) == 1:
        fruit, code, quantity, date = values[0], None, None, None
    elif len(values) == 2:
        fruit, code, quantity, date = values[0], None, values[1], None
    elif len(values) == 3:
        fruit, code, quantity, date = values[0], values[1], None, values[2]
    elif len(values) == 4:
        fruit, code, quantity, date = values[0], values[1], values[2], values[3]
    else:
        pass
    fruits.append(fruit)
    codes.append(code)
    quantities.append(quantity)
    dates.append(date)

df['fruit'] = pd.Series(fruits)
df['codes'] = pd.Series(codes)
df['quantities'] = pd.Series(quantities)
df['dates'] = pd.Series(dates)
del df['Column B']
print(df)
#   Column A     fruit codes quantities       dates
#0        12   "apple"   123      "six"  03-10-2020
#1        10  "banana"  None      "two"        None
#2         5   "mango"   156       None  02-10-2020

I assums that if Column B has three values, it would be 'fruit', 'code', and 'date' (for example, "mango",156,02-10-2020), not 'fruit', 'quantity', and 'date'. You might want to modify my code if your pattern is more varied.

  • Related