Home > Net >  Apply function for multiple levels of tables/data
Apply function for multiple levels of tables/data

Time:04-15

I get a problem in my work. I have tables:

import pandas as pd
import numpy as np

level1 = pd.DataFrame(list(zip(['a', 'b', 'c'], [3, 'x', 'x'])),
columns=['name', 'value'])
   name value
0   a   3
1   b   x
2   c   x

I want to sum the value column, but it contains “x”s. So I will have to use the second table to calculate “x”s :

level2 = pd.DataFrame(list(zip(['b', 'b', 'c', 'c', 'c'], ['b1', 'b2', 'c1', 'c2', 'c3'], [5, 7, 2, 'x', 9])), 
columns=['name', 'sub', 'value'])
  name  sub value
0   b   b1  5
1   b   b2  7
2   c   c1  2
3   c   c2  x
4   c   c3  9

I should sum the b1, b2 to give “x” for b in level1 table (x=12). But for c, there is “x”, so a third level table:

level3 = pd.DataFrame(list(zip(['c', 'c', 'c'], ['c1', 'c2', 'c3'], [2, 4, 9])), 
columns=['name', 'sub', 'value'])
  name  sub value
0   c   c1  2
1   c   c2  4
2   c   c3  9

Now, we can get the sum value for value column in level1 table.

My question is: can we use a function to calculate it easily? If there are more levels, how can we loop them till no “x”?

It is OK to combine level2 and level3.

CodePudding user response:

One option is a combination of merge(multiple merges actually) and a groupby:

(level2
.merge(level3, on = ['name', 'sub'], how = 'left', suffixes = (None, '_y'))
.assign(value = lambda df: np.where(df.value.eq('x'), df.value_y, df.value))
.groupby('name', as_index = False)
.value
.sum()
.merge(level1, on = 'name', how = 'right', suffixes = ('_x',None))
.assign(value = lambda df: np.where(df.value.eq('x'), df.value_x, df.value))
.loc[:, ['name', 'value']]
)

  name value
0    a     3
1    b  12.0
2    c  15.0

CodePudding user response:

Here's a way using combine_first and replace:

from functools import reduce
l1 = level1.assign(sub=level1['name'] '1').replace('x', np.nan).set_index(['name', 'sub'])
l2 = level2.replace('x', np.nan).set_index(['name', 'sub'])
l3 = level3.replace('x', np.nan).set_index(['name', 'sub'])

reduce(lambda x, y: x.combine_first(y), [l3,l2,l1]).groupby(level=0).sum()

Output:

      value
name       
a       3.0
b      12.0
c      15.0

Complete example:

import pandas as pd
import numpy as np

level1 = pd.DataFrame(list(zip(['a', 'b', 'c'], [3, 'x', 'x'])),
                      columns=['name', 'value'])

level2 = pd.DataFrame(list(zip(['b', 'b', 'c', 'c', 'c'], 
                               ['b1', 'b2', 'c1', 'c2', 'c3'], 
                               [5, 7, 2, 'x', 9])), 
                      columns=['name', 'sub', 'value'])

level3 = pd.DataFrame(list(zip(['c', 'c', 'c'], 
                               ['c1', 'c2', 'c3'], 
                               [2, 4, 9])),
                      columns=['name', 'sub', 'value'])

from functools import reduce
l1 = level1.assign(sub=level1['name'] '1')\
           .replace('x', np.nan)\
           .set_index(['name', 'sub'])
l2 = level2.replace('x', np.nan)\
           .set_index(['name', 'sub'])
l3 = level3.replace('x', np.nan)\
           .set_index(['name', 'sub'])

out = reduce(lambda x, y: x.combine_first(y), 
             [l3,l2,l1]).groupby(level=0).sum()
print(out)
  • Related