Parse a string of floats in Python-CodePudding

list=[['10, 0.01, 0.0428, 120; 30, 0.1, 2, 33; 50, 0.023, 0.31, 0.65'],
      ['10, 0.7, 0.5428, 2.31'],
      ['50, 0.3, 0.35, 0.1'],
      ['-10, 0.2, 0.048, 124; -30, 0.11, 24, 3; -50, 0.02, 0.1, 0.60; 0, 0, 0, 0; 10, 0.1, 2, 33; 
       20, 0.023, 0.31, 0.66']]

df=pd.DataFrame(list)

I have a dataframe df from which I am trying to get the 3rd value after each semicolon sign if the column name matches with the 1st value after the semicolon sign. The expected output is as below. Any clue on how to tackle this in a simple way?

CodePudding user response：

Update

A more pythonic version:

from collections import defaultdict

d = defaultdict(list)
for i in l:
    for j in i[0].split(';'):
        k = j.split(',')
        c, r = int(k[0]), float(k[2])
        d[c].append(r)

df = pd.DataFrame(d.values(), index=d.keys()).T.fillna(0)[sorted(d.keys())]

Output:

>>> df
   -50   -30    -10   0       10    20   30    50
0  0.1  24.0  0.048  0.0  0.0428  0.31  2.0  0.31
1  0.0   0.0  0.000  0.0  0.5428  0.00  0.0  0.35
2  0.0   0.0  0.000  0.0  2.0000  0.00  0.0  0.00

Old answer

You can try this ugly code below:

l = [['10, 0.01, 0.0428, 120; 30, 0.1, 2, 33; 50, 0.023, 0.31, 0.65'],
      ['10, 0.7, 0.5428, 2.31'],
      ['50, 0.3, 0.35, 0.1'],
      ['-10, 0.2, 0.048, 124; -30, 0.11, 24, 3; -50, 0.02, 0.1, 0.60; 0, 0, 0, 0; 10, 0.1, 2, 33; 20, 0.023, 0.31, 0.66']]

df = pd.DataFrame(l, columns=['Values'])['Values'] \
       .str.split(';').explode() \
       .str.split(',', expand=True)[[0, 2]] \
       .rename(columns={0: 'col', 2: 'row'}) \
       .astype({'col': int, 'row': float}) \
       .groupby('col')['row'] \
       .apply(list) \
       .apply(pd.Series) \
       .T.fillna(0)

Output:

>>> df
col  -50   -30    -10   0       10    20   30    50
0    0.1  24.0  0.048  0.0  0.0428  0.31  2.0  0.31
1    0.0   0.0  0.000  0.0  0.5428  0.00  0.0  0.35
2    0.0   0.0  0.000  0.0  2.0000  0.00  0.0  0.00