i need to do a loop over rows of dataframe. My goal is to change value compared to the previous line value. For instance, it's my dataframe :
index | date | id | value_dex |
---|---|---|---|
90256 | 2021-05-10 01:45:20 | 101904285 | 7.6 |
90257 | 2021-05-10 01:45:20 | 101904285 | 7.6 |
90258 | 2021-05-10 02:00:00 | 101904285 | NaN |
90259 | 2021-05-10 02:00:44 | 101904285 | NaN |
90260 | 2021-05-10 02:00:44 | 101904285 | NaN |
90261 | 2021-05-10 02:00:44 | 101904285 | NaN |
90262 | 2021-05-10 02:21:51 | 101904285 | NaN |
90263 | 2021-05-10 02:21:51 | 101904285 | NaN |
90264 | 2021-05-10 02:21:51 | 101904285 | NaN |
90265 | 2021-05-10 02:21:51 | 101904285 | NaN |
90266 | 2021-05-10 03:00:00 | 101904285 | NaN |
90267 | 2021-05-10 03:00:44 | 101904285 | 12.0 |
90268 | 2021-05-10 03:00:44 | 101904285 | NaN |
90269 | 2021-05-10 03:00:44 | 101904285 | NaN |
I want to change the values to 30 minutes if there are no intermediate values. For this I use a dictionary that for each id will update the last date when the value was changed. The result I want to obtain is the following :
index | date | id | value_dex |
---|---|---|---|
90256 | 2021-05-10 01:45:20 | 101904285 | 7.6 |
90257 | 2021-05-10 01:45:20 | 101904285 | 7.6 |
90258 | 2021-05-10 02:00:00 | 101904285 | 7.6 |
90259 | 2021-05-10 02:00:44 | 101904285 | 7.6 |
90260 | 2021-05-10 02:00:44 | 101904285 | 7.6 |
90261 | 2021-05-10 02:00:44 | 101904285 | 7.6 |
90262 | 2021-05-10 02:21:51 | 101904285 | NaN |
90263 | 2021-05-10 02:21:51 | 101904285 | NaN |
90264 | 2021-05-10 02:21:51 | 101904285 | NaN |
90265 | 2021-05-10 02:21:51 | 101904285 | NaN |
90266 | 2021-05-10 03:00:00 | 101904285 | NaN |
90267 | 2021-05-10 03:00:44 | 101904285 | 12.0 |
90268 | 2021-05-10 03:00:44 | 101904285 | 12.0 |
90269 | 2021-05-10 03:00:44 | 101904285 | 12.0 |
I have some problems with my loop because for some lines, it do not update the values, here my code :
name_columns = ['value_dex']
# List of ID
list_id = []
for i, row in data.iterrows():
if(isNaN(data.at[i, column])):
list_id.append(data.at[i, 'id'])
# Remove duplicate
id_row = set(list_id)
id_row = list(id_row)
# Create dictionnary
timestp = [0] * len(id_row)
dictionary = dict(zip(id_row, timestp))
# Add initial value
first_date = "2000-01-01 00:00:00"
first_date = pd.to_datetime(first_date, infer_datetime_format=True)
for key in dictionary:
dictionary.update({key: first_date})
# Print dictionnary
#for key in dictionary:
# print(key, '->', dictionary[key])
# For each line of the dataframe
for i, row in data.iterrows():
# if column value is not nan
if(isNaN(data.at[i, column])):
# calculation of the 30 minutes slot
str_time = data.at[i, 'date']
str_time = pd.to_datetime(str_time, infer_datetime_format=True)
final_time = str_time timedelta(minutes=30)
key_id = data.at[i, 'id']
column_value = data.at[i, column]
if(i == (len(data)-1)):
j = i
else:
j = i 1
while data.at[j, 'date'] < final_time and data.at[j, 'date'] > dictionary[key_id] and data.at[j, 'id'] == key_id :
if(isNaN(data.at[j, column]) == False):
data.at[j, column] = column_value
j = 1
else:
new_time = data.at[i, 'date'] timedelta(seconds=1)
dictionary[key_id] = new_time
break
Do you have any solutions that could resolve my problems ? I think my loop is not very optimized ... Thanks a lot !
CodePudding user response:
IIUC your code, you can use rolling
:
ffill = lambda x: pd.DataFrame(x.rolling('30T', on='date')['value_dex'].max(),
index=x.index)
df['value_dex'] = df.groupby('id', as_index=False).apply(ffill)
print(df)
# Output
date id value_dex
90256 2021-05-10 01:45:20 101904285 7.6
90257 2021-05-10 01:45:20 101904285 7.6
90258 2021-05-10 02:00:00 101904285 7.6
90259 2021-05-10 02:00:44 101904285 7.6
90260 2021-05-10 02:00:44 101904285 7.6
90261 2021-05-10 02:00:44 101904285 7.6
90262 2021-05-10 02:21:51 101904285 7.6
90263 2021-05-10 02:21:51 101904285 7.6
90264 2021-05-10 02:21:51 101904285 7.6
90265 2021-05-10 02:21:51 101904285 7.6
90266 2021-05-10 03:00:00 101904285 NaN
90267 2021-05-10 03:00:44 101904285 12.0
90268 2021-05-10 03:00:44 101904285 12.0
90269 2021-05-10 03:00:44 101904285 12.0