I got some ugly data. I have a dataframe with a column, C, that has values that are strings for ea. row. However, it's a bit more complex. The strings look like below.
Yep, they are strings. No, definitely not a list of string sets... outright strings.
I want to iterate through ea. row and get the 'info' values (actually strings) from the sets (actually strings) that have cat=1 and cat=2 to create two new columns to populate. What I want:
Ideas?
CodePudding user response:
You can clean up like:
temp = df['C'].str.strip('[]').str.split('}, ').explode()
df['cat_1'] = temp.apply(lambda x: x[13:].strip('}') if x[1:6]=='cat=1' else '').reset_index().groupby('index').agg(lambda x: ', '.join(x))['C'].str.strip(', ')
df['cat_2'] = temp.apply(lambda x: x[13:].strip('}') if x[1:6]=='cat=2' else '').reset_index().groupby('index').agg(lambda x: ', '.join(x))['C'].str.strip(', ')
Output:
C cat_1 cat_2
0 []
1 [{cat=1, data=adjks}, {cat=1, data=pqoek}, {ca... adjks, pqoek hjksy
2 []
3 [{cat=1, data=alpqi}] alpqi
4 [{cat=5, data=weee}, {cat=6, data=wolpwolp}]