I have two dataframe, df1 and df2, the shape is the same
now I'd like to calculate expanding mean of df2 along columns from a certain column for each row. however, I'd like to add condition by df1>0 so only include certain columns in calculating mean.
Here is what I have in my mind. I think I need to get column index which meet the condition. Not sure how to do it?
column_index=df1.iloc[:,2:]>0
The below only starts from column 2, not sure how to exclude the first 2 columns
df2.loc[:,column_index]=df2.iloc[:,column_index].expanding(axis=1).mean()
Here is the df1 dictionary:
Wellname Padname 2021-10-01 2021-10-02 2021-10-03 2021-10-04
0 19A77 77 7.482527 7.482464 7.481077 7.481595
1 18A75 75 8.539606 8.538932 0.000000 8.537870
2 19A74 74 0.000000 8.436184 8.436184 8.436478
3 6A75 75 0.000000 0.000000 0.000000 0.000000
4 11A74 74 0.000000 0.000000 0.000000 0.000000
{'Wellname': {0: '19A77', 1: '18A75', 2: '19A74', 3: '6A75', 4: '11A74'},
'Padname': {0: '77', 1: '75', 2: '74', 3: '75', 4: '74'},
datetime.date(2021, 10, 1): {0: 7.482527,
1: 8.539606,
2: 0.0,
3: 0.0,
4: 0.0},
datetime.date(2021, 10, 2): {0: 7.482464,
1: 8.538932,
2: 8.436184,
3: 0.0,
4: 0.0},
datetime.date(2021, 10, 3): {0: 7.4810767,
1: 0.0,
2: 8.436184,
3: 0.0,
4: 0.0},
datetime.date(2021, 10, 4): {0: 7.4815946,
1: 8.53787,
2: 8.436478,
3: 0.0,
4: 0.0}}
Here is df2 dictionary:
Wellname Padname 2021-10-01 2021-10-02 2021-10-03 2021-10-04
0 19A77 77 0.030476 0.031979 0.030553 0.031249
1 18A75 75 0.012853 0.013348 0.014725 0.013044
2 19A74 74 0.000000 0.008568 0.008858 0.009324
3 6A75 75 0.000000 0.000000 0.000000 0.000000
4 11A74 74 0.000000 0.000000 0.000000 0.00000
{'Wellname': {0: '19A77', 1: '18A75', 2: '19A74', 3: '6A75', 4: '11A74'},
'Padname': {0: '77', 1: '75', 2: '74', 3: '75', 4: '74'},
datetime.date(2021, 10, 1): {0: 0.03047586123578023,
1: 0.012852623962781664,
2: 0.0,
3: 0.0,
4: 0.0},
datetime.date(2021, 10, 2): {0: 0.03197930324012337,
1: 0.013347760497657282,
2: 0.0085677628558828,
3: 0.0,
4: 0.0},
datetime.date(2021, 10, 3): {0: 0.030553271599108457,
1: 0.01472520929639739,
2: 0.008857505630343878,
3: 0.0,
4: 0.0},
datetime.date(2021, 10, 4): {0: 0.0312492523780255,
1: 0.013043631856251387,
2: 0.009324431738911416,
3: 0.0,
4: 0.0}}
Here is my first try and I still got an error
mask=df1.iloc[:,2:]>0
mask
2021-10-01 2021-10-02 2021-10-03 2021-10-04
0 True True True True
1 True True False True
2 False True True True
3 False False False False
4 False False False False
after getting boolean matrix, then I did below
df3=df2.iloc[:,2:]
b=df3.loc[mask]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
C:\Temp\1\ipykernel_624\3951860798.py in <module>
----> 1 b=df3.loc[mask]
2 b
C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
929
930 maybe_callable = com.apply_if_callable(key, self.obj)
--> 931 return self._getitem_axis(maybe_callable, axis=axis)
932
933 def _is_scalar_access(self, key: tuple):
C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
1149
1150 if hasattr(key, "ndim") and key.ndim > 1:
-> 1151 raise ValueError("Cannot index with multidimensional key")
1152
1153 return self._getitem_iterable(key, axis=axis)
ValueError: Cannot index with multidimensional key
Thanks
CodePudding user response:
Here's a solution using the mask
you have provided.
df2[mask].expanding(axis=1).mean()
2021-10-01 2021-10-02 2021-10-03 2021-10-04
0 0.030476 0.031228 0.031003 0.031064
1 0.012853 0.013100 0.013100 0.013081
2 NaN 0.008568 0.008713 0.008917
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN