Let's say that I have a DataFrame
with column A
which is a list of strings of the form "Type:Value" where Type
can have 5 different values and Value
can be anything. What I would like to do is to create new 5 columns (each having appropriate Type
name) where the value in each column would be the list of items which has a given Type
. So if I have (1 row for simplicity):
df = pd.DataFrame("A": [["Type1:Value1", "Type2:Value2", "Type1:Value3"]])
then the result should be:
df = pd.DataFrame("Type1": [["Value1", "Value3"]], "Type2":[["Value2"]])
CodePudding user response:
Goes without saying, but there is a probably a better way to do this.
import pandas as pd
df = pd.DataFrame({"A": [["Type1:Value1", "Type2:Value2", "Type1:Value3"]]})
buffer_dict = {} # placeholder dict
for index, row in df.iterrows():
for str_value in row['A']:
str_list = str_value.split(':')
key = str_list[0] # these are just for readability
value = str_list[1]
buffer_dict.setdefault(key, []).append(value) # set default to list and append values
buffer_dict.update((k, [v]) for k, v in buffer_dict.items()) # enclose values in list so we can convert to df
result = pd.DataFrame.from_dict(buffer_dict)
print(result)
Result:
Type1 Type2
0 [Value1, Value3] [Value2]
EDIT: I missed the part where there can only be 5 types. My solution was assuming this was unknown and will work for any amount of types.
CodePudding user response:
One Solution. This can be done on loop as well. But since the number of columns were small, the code is less automated.
df = pd.DataFrame({"A": [["Type1:Value1", "Type2:Value2", "Type1:Value3"]]})
df[['x','y','z']] = df.A[0]
df['type1'] = df.x.str.split(':').str[1]
df['type2'] = df.x.str.split(':').str[1]
df['type1'] = "[" df['type1'] "," df.x.str.split(':').str[1] "]"
# df['Type1'] = df.x.str.split(':').str[0] ":" "[[" df.x.str.split(':').str[1] "," df.z.str.split(':').str[1] "]]"
print(df.drop(['A','x','y','z'], axis = 'columns'))
type1 type2
0 [Value1,Value1] Value1