How can I extract certain characters from a string, convert them to corresponding numeric form and u-CodePudding

I have a pandas dataframe df as shown (Note: df.to_dict() and df2.to_dict() is given at the end of the question):

Characters  Sum
L1  b3 b1   
L2  b5 b1 b2 b6 b9  
L3  b1 b2 b3 b6 b7 b9   
L4  b1 b5 b12

The characters b1, b2, b3 and so on have numeric values embedded with them from another dataframe df2.

 Buses  Values
    0   bus_1   2
    1   bus_2   3
    2   bus_3   3
    3   bus_4   1
    4   bus_5   6
    5   bus_6   4
    6   bus_7   3
    7   bus_8   7
    8   bus_9   2
    9   bus_10  4
    10  bus_11  6
    11  bus_12  7

For example,

b1 = df2["Values"][0]
b2 = df2["Values"][1]
b3 = df2["Values"][2]
b4 = df2["Values"][3]
b5 = df2["Values"][4]
b6 = df2["Values"][5]
b7 = df2["Values"][6]
b8 = df2["Values"][7]
b9 = df2["Values"][8]
b10 = df2["Values"][9]
b11 = df2["Values"][10]
b12 = df2["Values"][11]

In dataframe df, in column Sum, I'd like to have the real sum of the values of the characters which are given in Characters column. For example, for index L1, I'd like to have value 5 which is corresponding to b3 b1 which is specified in the Characters column.

I can split the characters from the string using split() method. For example, df["Characters"][0].split(" ") gives me ['b3', 'b1']

But how can I get the numeric values of b3 and b1 and get their sum in another column in df? Is it possible to get the sum using any other approaches like map()?

df.to_dict() is as shown: {

'Characters': {'L1': 'b3 b1',
  'L2': 'b5 b1 b2 b6 b9',
  'L3': 'b1 b2 b3 b6 b7 b9',
  'L4': 'b1 b5 b12'},
 'Sum': {'L1': '', 'L2': '', 'L3': '', 'L4': ''}}

df2.to_dict() is as shown:

{'Buses': {0: 'bus_1',
  1: 'bus_2',
  2: 'bus_3',
  3: 'bus_4',
  4: 'bus_5',
  5: 'bus_6',
  6: 'bus_7',
  7: 'bus_8',
  8: 'bus_9',
  9: 'bus_10',
  10: 'bus_11',
  11: 'bus_12'},
 'Demand (MW)': {0: 2,
  1: 3,
  2: 3,
  3: 1,
  4: 6,
  5: 4,
  6: 3,
  7: 7,
  8: 2,
  9: 4,
  10: 6,
  11: 7}}

CodePudding user response：

One questionable solution would be to just

eval('b1 b2 b3 b6 b7 b9'.replace("b",""))

I would probably use regex here, but that might be overkill

import re
sum(int(i) for i in re.findall('\d ', 'b1 b2 b3 b6 b7 b9'))

CodePudding user response：

IIUC, you want to map the matching Values of Buses of df2 in the sums of df.

An approach would be to split the string on , explode the column, then map each bx to the corresponding values of df2 (for this we need to rework a bit df2 to be a Series with the bx names as indexes. Finally, groupby "Character" and sum:

(df.assign(Sum=df['Sum'].str.split(' '))
   .explode('Sum')
   .assign(Sum=lambda d: d['Sum'].map(df2.assign(Buses=df2['Buses'].str.replace('us_', ''))
                                         .set_index('Buses')['Values']))
   .groupby('Characters', as_index=False).sum()
)

output:

  Characters  Sum
0         L1    5
1         L2   17
2         L3   17
3         L4   15