Home > Software design >  Apply change to timedelta to columns containing a given string
Apply change to timedelta to columns containing a given string

Time:05-14

Be the following DataFrame in python pandas:

date time_SEL time_02_SEL_01 time_03_SEL_05 other
2022-01-01 34756 233232 3432423 756
2022-01-03 23322 4343 3334 343
2022-02-01 123232 3242 23423 434
2022-03-01 7323232 32423 323423 34324

All columns other than date represent a fraction of time in seconds. My idea is to pass these values to TimeDelta, keeping in mind that I only want to apply the change to columns containing the string "_SEL".

Naturally I want to apply them per string, because in the original dataset, there will be more than 3 columns with this string. If there were only 3, I would know how to do it manually.

I hope you can help me, best regards.

CodePudding user response:

You can apply pandas.to_timedelta on all columns selected by filter and update the original dataframe:

df.update(df.filter(like='_SEL').apply(pd.to_timedelta, unit='s'))

NB. there is no output, the modification is inplace

updated dataframe:

         date         time_SEL     time_02_SEL      time_03_SEL  other
0  2022-01-01  0 days 09:39:16 2 days 16:47:12 39 days 17:27:03    756
1  2022-01-03  0 days 06:28:42 0 days 01:12:23  0 days 00:55:34    343
2  2022-02-01  1 days 10:13:52 0 days 00:54:02  0 days 06:30:23    434
3  2022-03-01 84 days 18:13:52 0 days 09:00:23  3 days 17:50:23  34324
update "TypeError: invalid type promotion"

ensure you have numbers:

(df.update(df.filter(like='_SEL')
             .apply(lambda c: pd.to_timedelta(pd.to_numeric(c, errors='coerce'),
                                              unit='s'))
)

CodePudding user response:

Use DataFrame.filter for get all columns ends by _SEL, convert to timedeltas by to_timedelta and replace original by DataFrame.update:

df.update(df.filter(regex='_SEL$').apply(lambda x: pd.to_timedelta(x, unit='s')))
print (df)
         date         time_SEL     time_02_SEL      time_03_SEL  other
0  2022-01-01  0 days 09:39:16 2 days 16:47:12 39 days 17:27:03    756
1  2022-01-03  0 days 06:28:42 0 days 01:12:23  0 days 00:55:34    343
2  2022-02-01  1 days 10:13:52 0 days 00:54:02  0 days 06:30:23    434
3  2022-03-01 84 days 18:13:52 0 days 09:00:23  3 days 17:50:23  34324

Another idea is filter column by Series.str.endswith:

m = df.columns.str.endswith('_SEL')
df.loc[:, m] = df.loc[:, m].apply(lambda x: pd.to_timedelta(x, unit='s'))
print (df)
         date         time_SEL     time_02_SEL      time_03_SEL  other
0  2022-01-01  0 days 09:39:16 2 days 16:47:12 39 days 17:27:03    756
1  2022-01-03  0 days 06:28:42 0 days 01:12:23  0 days 00:55:34    343
2  2022-02-01  1 days 10:13:52 0 days 00:54:02  0 days 06:30:23    434
3  2022-03-01 84 days 18:13:52 0 days 09:00:23  3 days 17:50:23  34324

EDIT: For convert values of columns to integers use .astype(int):

df.update(df.filter(regex='_SEL$').astype(int).apply(lambda x: pd.to_timedelta(x, unit='s')))

If failed, because some non numeric values use:

df.update(df.filter(regex='_SEL$').apply(lambda x: pd.to_timedelta(pd.to_numeric(x, errors='coerce'), unit='s')))
  • Related