I get a problem in my work. I have tables:
import pandas as pd
import numpy as np
level1 = pd.DataFrame(list(zip(['a', 'b', 'c'], [3, 'x', 'x'])),
columns=['name', 'value'])
name value
0 a 3
1 b x
2 c x
I want to sum the value
column, but it contains “x”s. So I will have to use the second table to calculate “x”s :
level2 = pd.DataFrame(list(zip(['b', 'b', 'c', 'c', 'c'], ['b1', 'b2', 'c1', 'c2', 'c3'], [5, 7, 2, 'x', 9])),
columns=['name', 'sub', 'value'])
name sub value
0 b b1 5
1 b b2 7
2 c c1 2
3 c c2 x
4 c c3 9
I should sum the b1
, b2
to give “x” for b in level1
table (x=12). But for c, there is “x”, so a third level table:
level3 = pd.DataFrame(list(zip(['c', 'c', 'c'], ['c1', 'c2', 'c3'], [2, 4, 9])),
columns=['name', 'sub', 'value'])
name sub value
0 c c1 2
1 c c2 4
2 c c3 9
Now, we can get the sum value for value
column in level1
table.
My question is: can we use a function to calculate it easily? If there are more levels, how can we loop them till no “x”?
It is OK to combine level2
and level3
.
CodePudding user response:
One option is a combination of merge(multiple merges actually) and a groupby:
(level2
.merge(level3, on = ['name', 'sub'], how = 'left', suffixes = (None, '_y'))
.assign(value = lambda df: np.where(df.value.eq('x'), df.value_y, df.value))
.groupby('name', as_index = False)
.value
.sum()
.merge(level1, on = 'name', how = 'right', suffixes = ('_x',None))
.assign(value = lambda df: np.where(df.value.eq('x'), df.value_x, df.value))
.loc[:, ['name', 'value']]
)
name value
0 a 3
1 b 12.0
2 c 15.0
CodePudding user response:
Here's a way using combine_first
and replace
:
from functools import reduce
l1 = level1.assign(sub=level1['name'] '1').replace('x', np.nan).set_index(['name', 'sub'])
l2 = level2.replace('x', np.nan).set_index(['name', 'sub'])
l3 = level3.replace('x', np.nan).set_index(['name', 'sub'])
reduce(lambda x, y: x.combine_first(y), [l3,l2,l1]).groupby(level=0).sum()
Output:
value
name
a 3.0
b 12.0
c 15.0
Complete example:
import pandas as pd
import numpy as np
level1 = pd.DataFrame(list(zip(['a', 'b', 'c'], [3, 'x', 'x'])),
columns=['name', 'value'])
level2 = pd.DataFrame(list(zip(['b', 'b', 'c', 'c', 'c'],
['b1', 'b2', 'c1', 'c2', 'c3'],
[5, 7, 2, 'x', 9])),
columns=['name', 'sub', 'value'])
level3 = pd.DataFrame(list(zip(['c', 'c', 'c'],
['c1', 'c2', 'c3'],
[2, 4, 9])),
columns=['name', 'sub', 'value'])
from functools import reduce
l1 = level1.assign(sub=level1['name'] '1')\
.replace('x', np.nan)\
.set_index(['name', 'sub'])
l2 = level2.replace('x', np.nan)\
.set_index(['name', 'sub'])
l3 = level3.replace('x', np.nan)\
.set_index(['name', 'sub'])
out = reduce(lambda x, y: x.combine_first(y),
[l3,l2,l1]).groupby(level=0).sum()
print(out)