Home > OS >  Create columns from another column which is a list of items
Create columns from another column which is a list of items

Time:07-16

Let's say that I have a DataFrame with column A which is a list of strings of the form "Type:Value" where Type can have 5 different values and Value can be anything. What I would like to do is to create new 5 columns (each having appropriate Type name) where the value in each column would be the list of items which has a given Type. So if I have (1 row for simplicity):

df = pd.DataFrame("A": [["Type1:Value1", "Type2:Value2", "Type1:Value3"]])

then the result should be:

df = pd.DataFrame("Type1": [["Value1", "Value3"]], "Type2":[["Value2"]])

CodePudding user response:

Goes without saying, but there is a probably a better way to do this.

import pandas as pd

df = pd.DataFrame({"A": [["Type1:Value1", "Type2:Value2", "Type1:Value3"]]})

buffer_dict = {}  # placeholder dict
for index, row in df.iterrows():
    for str_value in row['A']:
        str_list = str_value.split(':')
        key = str_list[0]  # these are just for readability
        value = str_list[1]
        buffer_dict.setdefault(key, []).append(value)  # set default to list and append values
buffer_dict.update((k, [v]) for k, v in buffer_dict.items())  # enclose values in list so we can convert to df
result = pd.DataFrame.from_dict(buffer_dict)
print(result)

Result:

              Type1     Type2
0  [Value1, Value3]  [Value2]

EDIT: I missed the part where there can only be 5 types. My solution was assuming this was unknown and will work for any amount of types.

CodePudding user response:

One Solution. This can be done on loop as well. But since the number of columns were small, the code is less automated.

df = pd.DataFrame({"A": [["Type1:Value1", "Type2:Value2", "Type1:Value3"]]})

df[['x','y','z']] = df.A[0]

df['type1'] = df.x.str.split(':').str[1]
df['type2'] = df.x.str.split(':').str[1]
df['type1'] = "["   df['type1']  ","  df.x.str.split(':').str[1]   "]"

# df['Type1'] = df.x.str.split(':').str[0]   ":"   "[["   df.x.str.split(':').str[1]   ","   df.z.str.split(':').str[1]   "]]"

print(df.drop(['A','x','y','z'], axis = 'columns'))



   type1   type2
0  [Value1,Value1]  Value1
  • Related