I have a time series data, converted to a dataframe. I have multiple columns, where the first column is timestamps and rest of the column names are timestamps with prices as values.
Sample dataframe :
[1]: https://i.stack.imgur.com/D2OWF.png sample dataframe
The idea is to iterate over the rows and check if the row value in 'date' column matches with any column name (highlighted in blue), if it does then the value at the intersection (highlighted in yellow) should stay and all the values after it (highlighted in grey) should be replaced with null or 0's.
For example : value in first column "2022-01-02 00:00:00 01:00" matches with column with the same name "2022-01-02 00:00:00 01:00". So the intersecting value i.e. "80.82" should stay and the rest of the values in that row (highlighted in grey) should replaced with null or 0's. I would really appreciate your help here.
I have tried the following but this replaces the intersecting value.
for i in df.columns:
df.loc[df['date']==i,i]=None
CodePudding user response:
"The idea is to iterate over the rows and check if the row value in 'date' column matches with any column name df['date'] == column
, if it does then the value at the intersection should stay and all the values after it df.columns[(idx 1):]
should be replaced with null or 0's ."
for idx, column in enumerate(df.columns):
df.loc[df['date'] == column, df.columns[(idx 1):]]=None
The enumerate()
function adds a counter to an iterable.
Explanation:
df.columns = ["date", "2022-01-01 23:55:00 01:00", "2022-01-02 00:00:00 01:00", "2022-01-02 00:05:00 01:00", ...]
list(enumerate(df.columns))
# return
[(0, "date"), (1, "2022-01-01 23:55:00 01:00"), (2, "2022-01-02 00:00:00 01:00"), (3, "2022-01-02 00:05:00 01:00"), ...]
In the for loop, the first (0, "date")
and second iteration (1, "2022-01-01 23:55:00 01:00")
didn't match with any row. The third iteration match with the first row. Using this as example
idx, column = 2, "2022-01-02 00:00:00 01:00"
df['date'] == column
returns a list of boolean [True, False, False, False, ...]
df.columns[(idx 1):]
is df.columns[3:]
which returns a list of column names after "2022-01-02 00:00:00 01:00"
, which is ["2022-01-02 00:05:00 01:00", "2022-01-02 00:10:00 01:00", ...]
df.loc[df['date'] == column, df.columns[(idx 1):]]=None
is equivalent to
df.loc[[True, False, False, ...], ["2022-01-02 00:05:00 01:00", "2022-01-02 00:10:00 01:00", ...]]=None
which fill the first row and the column after "2022-01-02 00:00:00 01:00"
as None
.