I have a 3D data in a pandas dataframe that I would like to 'oversample'/smooth by replacing the value at each x,y point with the average value of all the points that are within 5 units of that point. I can do it using a for loop like this (starting with a dataframe with three columns X,Y,Z):
import pandas as pd
Z_OS = []
X_OS = []
Y_OS = []
for inddex, row in df.iterrows():
Z_OS = [df[(df['X'] > row['X']-5) & (df['X']<row['X'] 5) & (df['Y'] > row['Y']-10) & (df1['Y']<row['Y'] 5)]['Z'].mean()]
X_OS = [row['X']]
Y_OS = [row['Y']]
dict = {
'X': X_OS,
'Y': Y_OS,
'Z': Z_OS
}
OSdf = pd.DataFrame.from_dict(dict)
but this method is very slow for large datasets and feels very 'unpythonic'. How could I do this without for loops? Is it possible via complex use of the groupby function?
CodePudding user response:
df['column_name'].rolling(rolling_window).mean()
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html
CodePudding user response:
xy = df[['x','y']]
df['smoothed z'] = df[['z']].apply(
lambda row: df['z'][(xy - xy.loc[row.name]).abs().lt(5).all(1)].mean(),
axis=1
)
- Here I used
df[['z']]
to get a column 'z' as a data frame. We need an index of a row, i.e.row.name
, when we apply a function to this column. .abs().lt(5).all(1)
read as absolut values which are all less then 5 along the row.
Update
The code below is actually the same but seems more consistent as it addresses directly the index:
df.index.to_series().apply(lambda i: df.loc[(xy - xy.loc[i]).abs().lt(5).all(1), 'z'].mean())