Home > Back-end >  expanding mean include conditions
expanding mean include conditions

Time:06-15

I have two dataframe, df1 and df2, the shape is the same

now I'd like to calculate expanding mean of df2 along columns from a certain column for each row. however, I'd like to add condition by df1>0 so only include certain columns in calculating mean.

Here is what I have in my mind. I think I need to get column index which meet the condition. Not sure how to do it?

column_index=df1.iloc[:,2:]>0

The below only starts from column 2, not sure how to exclude the first 2 columns


df2.loc[:,column_index]=df2.iloc[:,column_index].expanding(axis=1).mean()


Here is the df1 dictionary:


    Wellname    Padname 2021-10-01  2021-10-02  2021-10-03  2021-10-04
0   19A77   77  7.482527    7.482464    7.481077    7.481595
1   18A75   75  8.539606    8.538932    0.000000    8.537870
2   19A74   74  0.000000    8.436184    8.436184    8.436478
3   6A75    75  0.000000    0.000000    0.000000    0.000000
4   11A74   74  0.000000    0.000000    0.000000    0.000000

{'Wellname': {0: '19A77', 1: '18A75', 2: '19A74', 3: '6A75', 4: '11A74'},
 'Padname': {0: '77', 1: '75', 2: '74', 3: '75', 4: '74'},
 datetime.date(2021, 10, 1): {0: 7.482527,
  1: 8.539606,
  2: 0.0,
  3: 0.0,
  4: 0.0},
 datetime.date(2021, 10, 2): {0: 7.482464,
  1: 8.538932,
  2: 8.436184,
  3: 0.0,
  4: 0.0},
 datetime.date(2021, 10, 3): {0: 7.4810767,
  1: 0.0,
  2: 8.436184,
  3: 0.0,
  4: 0.0},
 datetime.date(2021, 10, 4): {0: 7.4815946,
  1: 8.53787,
  2: 8.436478,
  3: 0.0,
  4: 0.0}}

Here is df2 dictionary:

Wellname    Padname 2021-10-01  2021-10-02  2021-10-03  2021-10-04
0   19A77   77  0.030476    0.031979    0.030553    0.031249
1   18A75   75  0.012853    0.013348    0.014725    0.013044
2   19A74   74  0.000000    0.008568    0.008858    0.009324
3   6A75    75  0.000000    0.000000    0.000000    0.000000
4   11A74   74  0.000000    0.000000    0.000000    0.00000

{'Wellname': {0: '19A77', 1: '18A75', 2: '19A74', 3: '6A75', 4: '11A74'},
 'Padname': {0: '77', 1: '75', 2: '74', 3: '75', 4: '74'},
 datetime.date(2021, 10, 1): {0: 0.03047586123578023,
  1: 0.012852623962781664,
  2: 0.0,
  3: 0.0,
  4: 0.0},
 datetime.date(2021, 10, 2): {0: 0.03197930324012337,
  1: 0.013347760497657282,
  2: 0.0085677628558828,
  3: 0.0,
  4: 0.0},
 datetime.date(2021, 10, 3): {0: 0.030553271599108457,
  1: 0.01472520929639739,
  2: 0.008857505630343878,
  3: 0.0,
  4: 0.0},
 datetime.date(2021, 10, 4): {0: 0.0312492523780255,
  1: 0.013043631856251387,
  2: 0.009324431738911416,
  3: 0.0,
  4: 0.0}}

Here is my first try and I still got an error

mask=df1.iloc[:,2:]>0
mask

2021-10-01  2021-10-02  2021-10-03  2021-10-04
0   True    True    True    True
1   True    True    False   True
2   False   True    True    True
3   False   False   False   False
4   False   False   False   False

after getting boolean matrix, then I did below

df3=df2.iloc[:,2:]
b=df3.loc[mask]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
C:\Temp\1\ipykernel_624\3951860798.py in <module>
----> 1 b=df3.loc[mask]
      2 b

C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
    929 
    930             maybe_callable = com.apply_if_callable(key, self.obj)
--> 931             return self._getitem_axis(maybe_callable, axis=axis)
    932 
    933     def _is_scalar_access(self, key: tuple):

C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1149 
   1150                 if hasattr(key, "ndim") and key.ndim > 1:
-> 1151                     raise ValueError("Cannot index with multidimensional key")
   1152 
   1153                 return self._getitem_iterable(key, axis=axis)

ValueError: Cannot index with multidimensional key

Thanks

CodePudding user response:

Here's a solution using the mask you have provided.

df2[mask].expanding(axis=1).mean()

   2021-10-01  2021-10-02  2021-10-03  2021-10-04
0    0.030476    0.031228    0.031003    0.031064
1    0.012853    0.013100    0.013100    0.013081
2         NaN    0.008568    0.008713    0.008917
3         NaN         NaN         NaN         NaN
4         NaN         NaN         NaN         NaN
  • Related