Be the following DataFrame in python pandas:
date | time_SEL | time_02_SEL_01 | time_03_SEL_05 | other |
---|---|---|---|---|
2022-01-01 | 34756 | 233232 | 3432423 | 756 |
2022-01-03 | 23322 | 4343 | 3334 | 343 |
2022-02-01 | 123232 | 3242 | 23423 | 434 |
2022-03-01 | 7323232 | 32423 | 323423 | 34324 |
All columns other than date represent a fraction of time in seconds. My idea is to pass these values to TimeDelta, keeping in mind that I only want to apply the change to columns containing the string "_SEL".
Naturally I want to apply them per string, because in the original dataset, there will be more than 3 columns with this string. If there were only 3, I would know how to do it manually.
I hope you can help me, best regards.
CodePudding user response:
You can apply
pandas.to_timedelta
on all columns selected by filter
and update
the original dataframe:
df.update(df.filter(like='_SEL').apply(pd.to_timedelta, unit='s'))
NB. there is no output, the modification is inplace
updated dataframe:
date time_SEL time_02_SEL time_03_SEL other
0 2022-01-01 0 days 09:39:16 2 days 16:47:12 39 days 17:27:03 756
1 2022-01-03 0 days 06:28:42 0 days 01:12:23 0 days 00:55:34 343
2 2022-02-01 1 days 10:13:52 0 days 00:54:02 0 days 06:30:23 434
3 2022-03-01 84 days 18:13:52 0 days 09:00:23 3 days 17:50:23 34324
update "TypeError: invalid type promotion"
ensure you have numbers:
(df.update(df.filter(like='_SEL')
.apply(lambda c: pd.to_timedelta(pd.to_numeric(c, errors='coerce'),
unit='s'))
)
CodePudding user response:
Use DataFrame.filter
for get all columns ends by _SEL
, convert to timedeltas by to_timedelta
and replace original by DataFrame.update
:
df.update(df.filter(regex='_SEL$').apply(lambda x: pd.to_timedelta(x, unit='s')))
print (df)
date time_SEL time_02_SEL time_03_SEL other
0 2022-01-01 0 days 09:39:16 2 days 16:47:12 39 days 17:27:03 756
1 2022-01-03 0 days 06:28:42 0 days 01:12:23 0 days 00:55:34 343
2 2022-02-01 1 days 10:13:52 0 days 00:54:02 0 days 06:30:23 434
3 2022-03-01 84 days 18:13:52 0 days 09:00:23 3 days 17:50:23 34324
Another idea is filter column by Series.str.endswith
:
m = df.columns.str.endswith('_SEL')
df.loc[:, m] = df.loc[:, m].apply(lambda x: pd.to_timedelta(x, unit='s'))
print (df)
date time_SEL time_02_SEL time_03_SEL other
0 2022-01-01 0 days 09:39:16 2 days 16:47:12 39 days 17:27:03 756
1 2022-01-03 0 days 06:28:42 0 days 01:12:23 0 days 00:55:34 343
2 2022-02-01 1 days 10:13:52 0 days 00:54:02 0 days 06:30:23 434
3 2022-03-01 84 days 18:13:52 0 days 09:00:23 3 days 17:50:23 34324
EDIT: For convert values of columns to integers use .astype(int)
:
df.update(df.filter(regex='_SEL$').astype(int).apply(lambda x: pd.to_timedelta(x, unit='s')))
If failed, because some non numeric values use:
df.update(df.filter(regex='_SEL$').apply(lambda x: pd.to_timedelta(pd.to_numeric(x, errors='coerce'), unit='s')))