I know that I can use transform for transforming every element in a group in a dataframe into the minimum value. This is done with something like
df.groupby(level=0).transform('min')
My problem is, that all of my cells are strings, in fact tuplelike strings with floats inside, like "5.48$\pm$69.1". The minimum function here would transform it by string, which is incorrect.
Is there a good way of using a custom transform function only dealing with the first part of it?
An example input is:
df = pd.DataFrame({'0.001': {('Periodic', 'Klinger'): '0.3$\\pm$0.05',
('Periodic', 'Malte'): '0.26$\\pm$0.06',
('Periodic', 'Merkelig'): '0.22$\\pm$0.12',
('Periodic', 'Dings'): '0.18$\\pm$0.06',
('Periodic', 'Elf'): '0.28$\\pm$0.11',
('Periodic', 'Rar'): '0.2$\\pm$0.1',
('Periodic', 'Merd'): '0.12$\\pm$0.14',
('Sequential', 'Klinger'): '0.15$\\pm$0.14',
('Sequential', 'Malte'): '0.1$\\pm$0.1',
('Sequential', 'Merkelig'): '0.26$\\pm$0.09',
('Sequential', 'Dings'): '0.17$\\pm$0.16',
('Sequential', 'Elf'): '0.15$\\pm$0.12',
('Sequential', 'Rar'): '0.12$\\pm$0.1',
('Sequential', 'Merd'): '0.21$\\pm$0.13'},
'0.01': {('Periodic', 'Klinger'): '1.75$\\pm$1.27',
('Periodic', 'Malte'): '1.19$\\pm$1.51',
('Periodic', 'Merkelig'): '2.31$\\pm$0.54',
('Periodic', 'Dings'): '2.47$\\pm$0.37',
('Periodic', 'Elf'): '2.3$\\pm$1.3',
('Periodic', 'Rar'): '1.65$\\pm$0.59',
('Periodic', 'Merd'): '1.07$\\pm$1.68',
('Sequential', 'Klinger'): '1.14$\\pm$0.25',
('Sequential', 'Malte'): '2.99$\\pm$1.36',
('Sequential', 'Merkelig'): '2.85$\\pm$1.06',
('Sequential', 'Dings'): '2.61$\\pm$0.79',
('Sequential', 'Elf'): '1.62$\\pm$1.47',
('Sequential', 'Rar'): '1.29$\\pm$0.74',
('Sequential', 'Merd'): '2.88$\\pm$0.89'},
'0.1': {('Periodic', 'Klinger'): '18.75$\\pm$12.96',
('Periodic', 'Malte'): '15.9$\\pm$9.8',
('Periodic', 'Merkelig'): '36.47$\\pm$1.42',
('Periodic', 'Dings'): '16.13$\\pm$13.24',
('Periodic', 'Elf'): '26.36$\\pm$11.08',
('Periodic', 'Rar'): '11.26$\\pm$12.32',
('Periodic', 'Merd'): '17.55$\\pm$10.78',
('Sequential', 'Klinger'): '36.26$\\pm$3.19',
('Sequential', 'Malte'): '20.2$\\pm$14.42',
('Sequential', 'Merkelig'): '18.62$\\pm$15.79',
('Sequential', 'Dings'): '5.64$\\pm$7.28',
('Sequential', 'Elf'): '25.55$\\pm$12.74',
('Sequential', 'Rar'): '19.65$\\pm$16.98',
('Sequential', 'Merd'): '14.53$\\pm$2.54'}})
There are three columns, 0.1, 0.01 and 0.001. There is a multiindex consisting of two values and I want the minimum values within every column for each multiindex-first-value.
Everything is done by
df.groupby(level=0).transform('min')
but the minimum function is wrong due to the format of the values
CodePudding user response:
Is this any helpful for your use case ?
import re
def min_float(x):
floats = [float(re.search(r'([\d\.] )', val).group(1)) for val in x]
return min(floats)
df.groupby(level=0).transform(min_float)
CodePudding user response:
Use Series.str.extract
for values before $
, convert to floats, so possible use min
:
f = lambda x: x.str.extract('([\d\.] )\$', expand=False).astype(float).min()
df = df.groupby(level=0).transform(f)
print (df)
0.001 0.01 0.1
Periodic Klinger 0.12 1.07 11.26
Malte 0.12 1.07 11.26
Merkelig 0.12 1.07 11.26
Dings 0.12 1.07 11.26
Elf 0.12 1.07 11.26
Rar 0.12 1.07 11.26
Merd 0.12 1.07 11.26
Sequential Klinger 0.10 1.14 5.64
Malte 0.10 1.14 5.64
Merkelig 0.10 1.14 5.64
Dings 0.10 1.14 5.64
Elf 0.10 1.14 5.64
Rar 0.10 1.14 5.64
Merd 0.10 1.14 5.64
If need also original values is possible use Series.sort_values
with key
parameter for sortying by floats and then select first value by position:
f = lambda x: x.sort_values(key = lambda y: y.str.extract('([\d\.] )\$', expand=False).astype(float)).iat[0]
df = df.groupby(level=0).transform(f)
print (df)
0.001 0.01 0.1
Periodic Klinger 0.12$\pm$0.14 1.07$\pm$1.68 11.26$\pm$12.32
Malte 0.12$\pm$0.14 1.07$\pm$1.68 11.26$\pm$12.32
Merkelig 0.12$\pm$0.14 1.07$\pm$1.68 11.26$\pm$12.32
Dings 0.12$\pm$0.14 1.07$\pm$1.68 11.26$\pm$12.32
Elf 0.12$\pm$0.14 1.07$\pm$1.68 11.26$\pm$12.32
Rar 0.12$\pm$0.14 1.07$\pm$1.68 11.26$\pm$12.32
Merd 0.12$\pm$0.14 1.07$\pm$1.68 11.26$\pm$12.32
Sequential Klinger 0.1$\pm$0.1 1.14$\pm$0.25 5.64$\pm$7.28
Malte 0.1$\pm$0.1 1.14$\pm$0.25 5.64$\pm$7.28
Merkelig 0.1$\pm$0.1 1.14$\pm$0.25 5.64$\pm$7.28
Dings 0.1$\pm$0.1 1.14$\pm$0.25 5.64$\pm$7.28
Elf 0.1$\pm$0.1 1.14$\pm$0.25 5.64$\pm$7.28
Rar 0.1$\pm$0.1 1.14$\pm$0.25 5.64$\pm$7.28
Merd 0.1$\pm$0.1 1.14$\pm$0.25 5.64$\pm$7.28