Need some help here: I want to have column1 of df2 returning the pertinent Cat values. Below the code and actual output and expected output:
import pandas as pd
import numpy as np
data = {'cat': [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3],
'date': ['2021-12-30', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06',\
'2022-01-09', '2022-01-10', '2021-12-30', '2022-01-02', '2022-01-03', '2022-01-04', \
'2022-01-05', '2022-01-06', '2022-01-09', '2022-01-10','2021-12-30', '2022-01-02', \
'2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-09', '2022-01-10'],
'value': [99435, 99401, 100113, 100528, 100428, 100734, 99035, 100077, 100018, 99425, 100728, 100863, \
99930, 100076, 100995, 100640, 99158, 100868, 100819, 99247, 100851, 100500, 100082, 99089],
'act': [1.0000, 0.7981, 0.7785, 0.3563, 0.1916, 0.0000, 0.0000, 0.0233, 1.0000, 0.5625, 0.5774, \
0.6777, 0.7300, 0.1951, 0.1966, 0.6413, 1.0000, 0.7905, 0.7867, 0.000, 0.8769, 0.4683, 0.7122, 0.7183]
}
df =pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['prev'] = df.groupby('cat')['act'].shift(1)
df['ln_return'] = np.log(df['act']/df['prev'].clip(0.000001)).clip(np.log(0.000001))
df['ln_return'] = df['ln_return'].mask(df['prev'].eq(0)&df['act'].eq(0), 0)
df = df.dropna()
a_avg = df.groupby('cat')['value'].mean().to_numpy()
a_test = a_avg /1000
df2 = pd.DataFrame({'col2': a_avg, 'col3':a_test})
print(df2)
Out: Expected:
col2 col3 col1 col2 col3
0 100045.142857 100.045143 0 1 100045.142857 100.045143
1 100379.571429 100.379571 1 2 100379.571429 100.379571
2 100208.000000 100.208000 2 3 100208.000000 100.208000
CodePudding user response:
You shouldn't drop down to numpy
because that can create issues with alignment. Insted work fully within pandas
. In this case you can groupby
agg
, then eval
to create your column divided by 1000, and rename
the cat column to what you want.
The eval
can be avoided if you separate it into multiple statements and instead just do the math.
(df.groupby('cat', as_index=False)
.agg(col2=('value', 'mean'))
.eval('col3 = col2/1000')
.rename(columns={'cat': 'col1'}))
col1 col2 col3
0 1 100045.142857 100.045143
1 2 100379.571429 100.379571
2 3 100208.000000 100.208000