I have a pandas DataFrame
with several columns, one of which is time t
. I know how to bin these data based on time, and apply a test to the mean values of certain variables in each time bin. I now wish to apply the results of the tests on the binned data back to the original data, ie if the time of an event falls in a bin that passes or fails, that event passes or fails. My example code is below, and I am looking for suggestions for the last step.
import numpy as np
import pandas as pd
# Generate data
t = np.random.uniform(0, 100, 1000)
a = np.random.uniform(-1, 1, len(t))
b = np.random.uniform(-5, -1, len(t))
c = np.random.uniform(10, 100, len(t))
# Create a data frame
df = pd.DataFrame(zip(t, a, b, c), columns=['t', 'a', 'b', 'c'])
df.sort_values('t', inplace=True)
print(df)
# Bin data over time with mean values
binEdges = np.linspace(0, 100, 11)
dfBinAv = df.groupby(pd.cut(df['t'], binEdges)).mean()
# Test mean values
dfBinAv['a test'] = [aa >=0 for aa in dfBinAv['a']]
dfBinAv['b test'] = [-4 <= bb for bb in dfBinAv['b']]
dfBinAv['Test'] = [aa and bb for aa, bb in zip(dfBinAv['a test'], dfBinAv['b test'])]
print(dfBinAv)
# Add columns to original, unbinned DataFrame, containing whether or not variable t falls in a pass or fail bin for each test???
# ???
# For each line in df,
# figure out what line of dfBinAv it maps to,
# and bring back the values of the tests to new columns
# At the end, df should have new columns 'a test', 'b test' and 'Test'
CodePudding user response:
Just try with transform
g = df.groupby(pd.cut(df['t'], binEdges))
df['a test'] = g['a'].transform('mean')>=0
df['b test'] = g['b'].transform('mean')>=-4
df['Test'] = df['a test'] & df['b test']
CodePudding user response:
Assuming you need the intermediate dataframe dfBinAv
, you can merge
the result using a temporary "group" column:
(df.assign(group=pd.cut(df['t'], binEdges))
.merge(dfBinAv.filter(regex='[tT]est'), left_on='group', right_index=True)
.drop(columns='group')
)
output:
t a b c a test b test Test
434 0.055110 0.809890 -3.682123 11.769749 True True True
333 0.067866 0.747242 -3.274282 80.942789 True True True
867 0.219043 -0.197045 -2.891429 77.233564 True True True
318 0.267738 -0.970197 -3.640299 94.792442 True True True
471 0.333712 0.170670 -4.085683 27.816877 True True True
.. ... ... ... ... ... ... ...
967 99.352415 0.573914 -2.966360 51.006924 False True False
501 99.357841 0.132028 -2.967747 28.956472 False True False
590 99.464089 -0.800425 -1.357792 94.359697 False True False
525 99.536830 0.736201 -1.096155 33.528584 False True False
957 99.958924 -0.092601 -3.163905 78.502521 False True False
[1000 rows x 7 columns]