Home > Net >  Column with String Values Representing List of Sets, Extract Relevant Info to New Columns
Column with String Values Representing List of Sets, Extract Relevant Info to New Columns

Time:12-15

I got some ugly data. I have a dataframe with a column, C, that has values that are strings for ea. row. However, it's a bit more complex. The strings look like below.

enter image description here

Yep, they are strings. No, definitely not a list of string sets... outright strings.

I want to iterate through ea. row and get the 'info' values (actually strings) from the sets (actually strings) that have cat=1 and cat=2 to create two new columns to populate. What I want:

enter image description here

Ideas?

CodePudding user response:

You can clean up like:

temp = df['C'].str.strip('[]').str.split('}, ').explode()
df['cat_1'] = temp.apply(lambda x: x[13:].strip('}') if x[1:6]=='cat=1' else '').reset_index().groupby('index').agg(lambda x: ', '.join(x))['C'].str.strip(', ')
df['cat_2'] = temp.apply(lambda x: x[13:].strip('}') if x[1:6]=='cat=2' else '').reset_index().groupby('index').agg(lambda x: ', '.join(x))['C'].str.strip(', ')

Output:

                                                   C         cat_1  cat_2
0                                                 []                     
1  [{cat=1, data=adjks}, {cat=1, data=pqoek}, {ca...  adjks, pqoek  hjksy
2                                                 []                     
3                              [{cat=1, data=alpqi}]         alpqi       
4       [{cat=5, data=weee}, {cat=6, data=wolpwolp}]                     
  • Related