Home > other >  How to map values from multiple series into one dataframe column with normalized index
How to map values from multiple series into one dataframe column with normalized index

Time:07-14

I have three boolean series with a date (daily interval), which need to be merged into one column with a certain string as a value in a dataframe. The value can be for now the naqme of the series, like "bool_series_1", "bool_series_2"... The dataframe has a loose time grid, so the values need also a normalization to map on.

Let's say this is bool_series_1:

2021-07-06    False
2021-07-07    True
2021-07-08    False

bool_series_2:

2021-07-06    True
2021-07-07    False
2021-07-08    False

bool_series_3:

2021-07-06    False
2021-07-07    False
2021-07-08    True

Dataframe:

datetime               somevalues
2021-07-06 15:30:19    17445
2021-07-06 15:33:26     5327
2021-07-06 15:35:00     2222
2021-07-06 15:36:00    10104
2021-07-06 15:38:15     5444
2021-07-07 15:30:43    83524       
2021-07-07 15:33:22     5327
2021-07-07 15:35:01     4918
2021-07-07 15:36:04    10104
2021-07-07 15:38:15     5214
2021-07-08 15:31:13    55543
2021-07-08 15:33:22     4234
2021-07-08 15:34:31     1321
2021-07-08 15:35:44    19876
2021-07-08 15:37:55     6453

I need this output:

datetime               somevalues   new_col
2021-07-06 15:30:19    17445        bool_series_2      
2021-07-06 15:33:26     5327        bool_series_2
2021-07-06 15:35:00     2222        bool_series_2
2021-07-06 15:36:00    10104        bool_series_2
2021-07-06 15:38:15     5444        bool_series_2
2021-07-07 15:30:43    83524        bool_series_1
2021-07-07 15:33:22     5327        bool_series_1
2021-07-07 15:35:01     4918        bool_series_1
2021-07-07 15:36:04    10104        bool_series_1
2021-07-07 15:38:15     5214        bool_series_1
2021-07-08 15:31:13    55543        bool_series_3
2021-07-08 15:33:22     4234        bool_series_3
2021-07-08 15:34:31     1321        bool_series_3
2021-07-08 15:35:44    19876        bool_series_3
2021-07-08 15:37:55     6453        bool_series_3

If it would be only one series I'd do it like this:

df['new_col'] = np.where(df.index.normalize().map(bool_series_1 == True), 'bool_series_1', False)

But this won't work repeatedly. Any ideas how to solve this?

CodePudding user response:

You can do it like this:

mapdate = pd.concat(
    [
        bool_series_1.map({True: "bool_series_1"}),
        bool_series_2.map({True: "bool_series_2"}),
        bool_series_3.map({True: "bool_series_3"}),
    ]
)
mapdate = mapdate.dropna()

mapdate.index = pd.to_datetime(mapdate.index)

df['new_col'] = df.index.normalize().map(mapdate)

Output:

                    somevalues        new_col
datetime                                      
2021-07-06 15:30:19       17445  bool_series_2
2021-07-06 15:33:26        5327  bool_series_2
2021-07-06 15:35:00        2222  bool_series_2
2021-07-06 15:36:00       10104  bool_series_2
2021-07-06 15:38:15        5444  bool_series_2
2021-07-07 15:30:43       83524  bool_series_1
2021-07-07 15:33:22        5327  bool_series_1
2021-07-07 15:35:01        4918  bool_series_1
2021-07-07 15:36:04       10104  bool_series_1
2021-07-07 15:38:15        5214  bool_series_1
2021-07-08 15:31:13       55543  bool_series_3
2021-07-08 15:33:22        4234  bool_series_3
2021-07-08 15:34:31        1321  bool_series_3
2021-07-08 15:35:44       19876  bool_series_3
2021-07-08 15:37:55        6453  bool_series_3

CodePudding user response:

Here's a way to do what your question asks:

from functools import reduce
from operator import add

df = df.assign(norm=df.index.normalize()).reset_index().set_index('norm', drop=True)
df['new_col'] = (
    reduce(add, (ser.map({True:name, False:''}) for ser, name in zip(
        [bool_series_1, bool_series_2, bool_series_3], 
        ['bool_series_1', 'bool_series_2', 'bool_series_3'])
    )))
df = df.set_index('datetime')

Explanation:

  • Use DatetimeIndex.normalize() to convert times to midnight in the large dataframe for matching with the dates in each boolean series
  • Create an iterable over boolean series, string value tuples by calling zip() for parallel lists of the boolean series and their corresponding string values ("bool_series_1", "bool_series_2"...)
  • Use a comprehension and Series.map() to create a generator that yields a transformed version of each series with True replaced by the string value corresponding to the series and False replaced by the empty string ''
  • Use reduce() and add() to concatenate the string values across all transformed series resulting in a series containing the desired string value for each row
  • Assign this series of string values to a new column new_col
  • Use set_index() to restore the original index.

Input:

df
                     somevalues
datetime
2021-07-06 15:30:19       17445
2021-07-06 15:33:26        5327
2021-07-06 15:35:00        2222
2021-07-06 15:36:00       10104
2021-07-06 15:38:15        5444
2021-07-07 15:30:43       83524
2021-07-07 15:33:22        5327
2021-07-07 15:35:01        4918
2021-07-07 15:36:04       10104
2021-07-07 15:38:15        5214
2021-07-08 15:31:13       55543
2021-07-08 15:33:22        4234
2021-07-08 15:34:31        1321
2021-07-08 15:35:44       19876
2021-07-08 15:37:55        6453

bool_series_1
2021-07-06    False
2021-07-07     True
2021-07-08    False
Name: 1, dtype: bool

bool_series_2
2021-07-06     True
2021-07-07    False
2021-07-08    False
Name: 1, dtype: bool

bool_series_3
2021-07-06    False
2021-07-07    False
2021-07-08     True
Name: 1, dtype: bool

Output:

                     somevalues        new_col
datetime
2021-07-06 15:30:19       17445  bool_series_2
2021-07-06 15:33:26        5327  bool_series_2
2021-07-06 15:35:00        2222  bool_series_2
2021-07-06 15:36:00       10104  bool_series_2
2021-07-06 15:38:15        5444  bool_series_2
2021-07-07 15:30:43       83524  bool_series_1
2021-07-07 15:33:22        5327  bool_series_1
2021-07-07 15:35:01        4918  bool_series_1
2021-07-07 15:36:04       10104  bool_series_1
2021-07-07 15:38:15        5214  bool_series_1
2021-07-08 15:31:13       55543  bool_series_3
2021-07-08 15:33:22        4234  bool_series_3
2021-07-08 15:34:31        1321  bool_series_3
2021-07-08 15:35:44       19876  bool_series_3
2021-07-08 15:37:55        6453  bool_series_3
  • Related