list=[['10, 0.01, 0.0428, 120; 30, 0.1, 2, 33; 50, 0.023, 0.31, 0.65'],
['10, 0.7, 0.5428, 2.31'],
['50, 0.3, 0.35, 0.1'],
['-10, 0.2, 0.048, 124; -30, 0.11, 24, 3; -50, 0.02, 0.1, 0.60; 0, 0, 0, 0; 10, 0.1, 2, 33;
20, 0.023, 0.31, 0.66']]
df=pd.DataFrame(list)
I have a dataframe df from which I am trying to get the 3rd value after each semicolon sign if the column name matches with the 1st value after the semicolon sign. The expected output is as below. Any clue on how to tackle this in a simple way?
CodePudding user response:
Update
A more pythonic version:
from collections import defaultdict
d = defaultdict(list)
for i in l:
for j in i[0].split(';'):
k = j.split(',')
c, r = int(k[0]), float(k[2])
d[c].append(r)
df = pd.DataFrame(d.values(), index=d.keys()).T.fillna(0)[sorted(d.keys())]
Output:
>>> df
-50 -30 -10 0 10 20 30 50
0 0.1 24.0 0.048 0.0 0.0428 0.31 2.0 0.31
1 0.0 0.0 0.000 0.0 0.5428 0.00 0.0 0.35
2 0.0 0.0 0.000 0.0 2.0000 0.00 0.0 0.00
Old answer
You can try this ugly code below:
l = [['10, 0.01, 0.0428, 120; 30, 0.1, 2, 33; 50, 0.023, 0.31, 0.65'],
['10, 0.7, 0.5428, 2.31'],
['50, 0.3, 0.35, 0.1'],
['-10, 0.2, 0.048, 124; -30, 0.11, 24, 3; -50, 0.02, 0.1, 0.60; 0, 0, 0, 0; 10, 0.1, 2, 33; 20, 0.023, 0.31, 0.66']]
df = pd.DataFrame(l, columns=['Values'])['Values'] \
.str.split(';').explode() \
.str.split(',', expand=True)[[0, 2]] \
.rename(columns={0: 'col', 2: 'row'}) \
.astype({'col': int, 'row': float}) \
.groupby('col')['row'] \
.apply(list) \
.apply(pd.Series) \
.T.fillna(0)
Output:
>>> df
col -50 -30 -10 0 10 20 30 50
0 0.1 24.0 0.048 0.0 0.0428 0.31 2.0 0.31
1 0.0 0.0 0.000 0.0 0.5428 0.00 0.0 0.35
2 0.0 0.0 0.000 0.0 2.0000 0.00 0.0 0.00