Home > Software design >  Find multiple text patterns and then output the next value in a sting in pandas
Find multiple text patterns and then output the next value in a sting in pandas

Time:11-18

I have a datframe with the following values

Call    Data
1   [{'b_id': '31358658', 'incentive': 0}, {'b_id': 'D8384E90', 'incentive': 0}, {'b_id': '681B405A','incentive': 100}]
2   [{'b_id': 'D8384E90','incentive': 0 }, {'b_id': '31358658',  'incentive': 0}, {'b_id': '681B405A', 'incentive': 120}]
3   [{'b_id': '971C0B58','incentive': 0,}]
4   [{'b_id': '00450AAA','incentive': 0}, {'b_id': '0BCAEC4F','incentive': 0}, {'b_id': 'F2AD1313''incentive': 220},{'b_id': '971C0B58', 'incentive': 0}]

Ideally I would like the output in this format


Call    B_id                                    incentive
1       [31358658,D8384E90,681B405A]            [0,0,100]
2       [D8384E90,31358658,681B405A]            [0,0,120]
3       [971C0B58]                              [0]
4       [00450AAA,0BCAEC4F,F2AD1313,971C0B58]   [0,0,220,0]

The length of the data column can wary So far I have tried

df1 = df1.join(df1['Data'].str.split('b_id',expand=True).add_prefix('data'))

is there a way to search for each b_id in the sting and then take the value followed by the ":" and then add it to the list

#sample data code 
Call = [1,2,3,4,5,6,7,8,9]
Data= [
    [{'b_id': '31358658', 'incentive': 0}, {'b_id': 'D8384E90', 'incentive': 0}, {'b_id': '681B405A','incentive': 100}],
                    [{'b_id': 'D8384E90','incentive': 0 }, {'b_id': '31358658',  'incentive': 0}, {'b_id': '681B405A', 'incentive': 120}],
                    [{'b_id': '971C0B58','incentive': 0}],
                    [{'b_id': '00450AAA','incentive': 0}, {'b_id': '0BCAEC4F','incentive': 0}, {'b_id': 'F2AD1313','incentive': 220},{'b_id': '971C0B58', 'incentive': 0}],
                    [{'b_id': '90591CC5','incentive': 0}, {'b_id': '31358658','incentive': 0,}],
                    [{'b_id': '20E32751', 'incentive': 0}, {'b_id': '339A574F','incentive': 0}],
                    [{'b_id': '971C0B58','incentive': 0}],
                    [],
                    ]
df = pd.DataFrame(list(zip(Call,Data)), columns =['Call','Data'])

All help appreciated

CodePudding user response:

you can use a loop in lambda function.:

import ast
df['Data']=df['Data'].apply(ast.literal_eval)

df['B_id']=df['Data'].apply(lambda x: [i['b_id'] for i in x])
df['incentive']=df['Data'].apply(lambda x: [i['incentive'] for i in x])
print(df.head(1)
'''
    Call    Data    B_id                                     incentive
0   1      [......] ['31358658', 'D8384E90', '681B405A']    [0, 0, 100]
....
'''
  
  • Related