Home > Back-end >  Pandas: Check each row for condition and insert row below if condition is met
Pandas: Check each row for condition and insert row below if condition is met

Time:02-13

this is my first question here as I really couldn't figure it out with related answers: I have a list of dataframes "df_list", for each user I have a dataframe which basically looks like:

-   |User |Timestamp                |Timestamp2         |check            |in_out
0   |0001 |2022-01-07 07:40:44  |2022-01-07 17:32:18    |0 days 09:51:34  |in
1   |0001 |2022-01-07 17:32:18  |2022-01-10 17:31:50    |2 days 23:59:32  |out
2   |0001 |2022-01-10 17:31:50  |2022-01-11 07:43:48    |0 days 14:11:58  |in

Data:

data = {'User': [1, 1, 1],
 'Timestamp': ['2022-01-07 07:40:44',
  '2022-01-07 17:32:18',
  '2022-01-10 17:31:50'],
 'Timestamp2': ['2022-01-07 17:32:18',
  '2022-01-10 17:31:50',
  '2022-01-11 07:43:48'],
 'check': ['0 days 09:51:34', '2 days 23:59:32', '0 days 14:11:58'],
 'in_out': ['in', 'out', 'in']}

I would like to go through all the dataframes in my df_list and inside each df I would like to add 1 row (below) where the 'check' value is > 15 hours.

Desired Output

-   |User |Timestamp                |Timestamp2         |check            |in_out
0   |0001 |2022-01-07 07:40:44  |2022-01-07 17:32:18    |0 days 09:51:34  |in
1   |0001 |2022-01-07 17:32:18  |2022-01-10 17:31:50    |2 days 23:59:32  |out
2   |0001  
3   |0001 |2022-01-10 17:31:50  |2022-01-11 07:43:48    |0 days 14:11:58  |in

Attempt:

So what I tried is to go with a for-loop into the list and there with another loop through all my rows in each dataframe. Then I checked via if-clause for my condition... I create a new blank line with the required index and then concat the dataframe so i can include the empty line.

for dfx in df_list:     
    for i in dfx.index.values.tolist():
        if (pd.Timedelta(dfx["check"].loc[i]) > pd.Timedelta("15 hours")) == True:
            line = pd.DataFrame({'ID': '', 'Timestamp': None,'Timestamp2': '','check': None}, index=[i 1])
            dfx = pd.concat([dfx.iloc[:i 1], line, dfx.iloc[i:]]).reset_index(drop=True)

I also found this here Add empty row if a condition is met Pandas but I can't get this running as well...

for i in dfy.index.values.tolist():
    indices = dfy.loc[pd.Timedelta(((dfy["check"].loc[i]) > pd.Timedelta("15 hours"))== True)].index.tolist()
    ERROR: Value must be Timedelta, string, integer, float, timedelta or convertible, not bool

In the next step I want to add the 'Timestamp' value from previous line or - 10 hours, depending on the 'value' of the in/out column...but I thought I should figure out one problem at a time...

CodePudding user response:

You can create a boolean mask using "check" column and add a row using Index.repeat and reindexing. Then sort_index and delete duplicate values:

msk = df['check'].astype('timedelta64[h]') > 15
df = df.reindex(df[msk].index.repeat(2)).append(df[~msk]).sort_index()
df.loc[df.duplicated(['Timestamp','check']), df.columns.drop('User')] = np.nan

Note that this assumes "check" is dtype timedelta object.

Output:

   User           Timestamp          Timestamp2           check in_out
0     1 2022-01-07 07:40:44 2022-01-07 17:32:18 0 days 09:51:34     in
1     1 2022-01-07 17:32:18 2022-01-10 17:31:50 2 days 23:59:32    out
1     1                 NaT                 NaT             NaT    NaN
2     1 2022-01-10 17:31:50 2022-01-11 07:43:48 0 days 14:11:58     in
  • Related