I have three boolean series with a date (daily interval), which need to be merged into one column with a certain string as a value in a dataframe. The value can be for now the naqme of the series, like "bool_series_1"
, "bool_series_2"
...
The dataframe has a loose time grid, so the values need also a normalization to map on.
Let's say this is bool_series_1:
2021-07-06 False
2021-07-07 True
2021-07-08 False
bool_series_2:
2021-07-06 True
2021-07-07 False
2021-07-08 False
bool_series_3:
2021-07-06 False
2021-07-07 False
2021-07-08 True
Dataframe:
datetime somevalues
2021-07-06 15:30:19 17445
2021-07-06 15:33:26 5327
2021-07-06 15:35:00 2222
2021-07-06 15:36:00 10104
2021-07-06 15:38:15 5444
2021-07-07 15:30:43 83524
2021-07-07 15:33:22 5327
2021-07-07 15:35:01 4918
2021-07-07 15:36:04 10104
2021-07-07 15:38:15 5214
2021-07-08 15:31:13 55543
2021-07-08 15:33:22 4234
2021-07-08 15:34:31 1321
2021-07-08 15:35:44 19876
2021-07-08 15:37:55 6453
I need this output:
datetime somevalues new_col
2021-07-06 15:30:19 17445 bool_series_2
2021-07-06 15:33:26 5327 bool_series_2
2021-07-06 15:35:00 2222 bool_series_2
2021-07-06 15:36:00 10104 bool_series_2
2021-07-06 15:38:15 5444 bool_series_2
2021-07-07 15:30:43 83524 bool_series_1
2021-07-07 15:33:22 5327 bool_series_1
2021-07-07 15:35:01 4918 bool_series_1
2021-07-07 15:36:04 10104 bool_series_1
2021-07-07 15:38:15 5214 bool_series_1
2021-07-08 15:31:13 55543 bool_series_3
2021-07-08 15:33:22 4234 bool_series_3
2021-07-08 15:34:31 1321 bool_series_3
2021-07-08 15:35:44 19876 bool_series_3
2021-07-08 15:37:55 6453 bool_series_3
If it would be only one series I'd do it like this:
df['new_col'] = np.where(df.index.normalize().map(bool_series_1 == True), 'bool_series_1', False)
But this won't work repeatedly. Any ideas how to solve this?
CodePudding user response:
You can do it like this:
mapdate = pd.concat(
[
bool_series_1.map({True: "bool_series_1"}),
bool_series_2.map({True: "bool_series_2"}),
bool_series_3.map({True: "bool_series_3"}),
]
)
mapdate = mapdate.dropna()
mapdate.index = pd.to_datetime(mapdate.index)
df['new_col'] = df.index.normalize().map(mapdate)
Output:
somevalues new_col
datetime
2021-07-06 15:30:19 17445 bool_series_2
2021-07-06 15:33:26 5327 bool_series_2
2021-07-06 15:35:00 2222 bool_series_2
2021-07-06 15:36:00 10104 bool_series_2
2021-07-06 15:38:15 5444 bool_series_2
2021-07-07 15:30:43 83524 bool_series_1
2021-07-07 15:33:22 5327 bool_series_1
2021-07-07 15:35:01 4918 bool_series_1
2021-07-07 15:36:04 10104 bool_series_1
2021-07-07 15:38:15 5214 bool_series_1
2021-07-08 15:31:13 55543 bool_series_3
2021-07-08 15:33:22 4234 bool_series_3
2021-07-08 15:34:31 1321 bool_series_3
2021-07-08 15:35:44 19876 bool_series_3
2021-07-08 15:37:55 6453 bool_series_3
CodePudding user response:
Here's a way to do what your question asks:
from functools import reduce
from operator import add
df = df.assign(norm=df.index.normalize()).reset_index().set_index('norm', drop=True)
df['new_col'] = (
reduce(add, (ser.map({True:name, False:''}) for ser, name in zip(
[bool_series_1, bool_series_2, bool_series_3],
['bool_series_1', 'bool_series_2', 'bool_series_3'])
)))
df = df.set_index('datetime')
Explanation:
- Use
DatetimeIndex.normalize()
to convert times to midnight in the large dataframe for matching with the dates in each boolean series - Create an iterable over
boolean series, string value
tuples by callingzip()
for parallel lists of the boolean series and their corresponding string values ("bool_series_1", "bool_series_2"...
) - Use a comprehension and
Series.map()
to create a generator that yields a transformed version of each series withTrue
replaced by the string value corresponding to the series andFalse
replaced by the empty string''
- Use
reduce()
andadd()
to concatenate the string values across all transformed series resulting in a series containing the desired string value for each row - Assign this series of string values to a new column
new_col
- Use
set_index()
to restore the original index.
Input:
df
somevalues
datetime
2021-07-06 15:30:19 17445
2021-07-06 15:33:26 5327
2021-07-06 15:35:00 2222
2021-07-06 15:36:00 10104
2021-07-06 15:38:15 5444
2021-07-07 15:30:43 83524
2021-07-07 15:33:22 5327
2021-07-07 15:35:01 4918
2021-07-07 15:36:04 10104
2021-07-07 15:38:15 5214
2021-07-08 15:31:13 55543
2021-07-08 15:33:22 4234
2021-07-08 15:34:31 1321
2021-07-08 15:35:44 19876
2021-07-08 15:37:55 6453
bool_series_1
2021-07-06 False
2021-07-07 True
2021-07-08 False
Name: 1, dtype: bool
bool_series_2
2021-07-06 True
2021-07-07 False
2021-07-08 False
Name: 1, dtype: bool
bool_series_3
2021-07-06 False
2021-07-07 False
2021-07-08 True
Name: 1, dtype: bool
Output:
somevalues new_col
datetime
2021-07-06 15:30:19 17445 bool_series_2
2021-07-06 15:33:26 5327 bool_series_2
2021-07-06 15:35:00 2222 bool_series_2
2021-07-06 15:36:00 10104 bool_series_2
2021-07-06 15:38:15 5444 bool_series_2
2021-07-07 15:30:43 83524 bool_series_1
2021-07-07 15:33:22 5327 bool_series_1
2021-07-07 15:35:01 4918 bool_series_1
2021-07-07 15:36:04 10104 bool_series_1
2021-07-07 15:38:15 5214 bool_series_1
2021-07-08 15:31:13 55543 bool_series_3
2021-07-08 15:33:22 4234 bool_series_3
2021-07-08 15:34:31 1321 bool_series_3
2021-07-08 15:35:44 19876 bool_series_3
2021-07-08 15:37:55 6453 bool_series_3