Home > Software design >  How to save dataframe to dictionary of different length
How to save dataframe to dictionary of different length

Time:05-18

So I have a data frame that looks like this

2018-01-01 00:00:00    False
2018-01-01 00:30:00    False
2018-01-01 01:00:00    False
2018-01-01 01:30:00    True
2018-01-01 02:00:00    True
2018-01-01 02:30:00    True
2018-01-01 03:00:00    False
2018-01-01 03:30:00    False
2018-01-01 04:00:00    True
2018-01-01 04:30:00    True

and it would continue for a full year. I want to save each chunk that are true to a dictionary, so it would look something like this

 dict{'0': [2018-01-01 01:30:00, 2018-01-01 02:00:00, 2018-01-01 02:30:00],
      '1': [2018-01-01 04:00:00, 2018-01-01 04:30:00]}

So I'm don't know how many times throughout the year the value is true continuously. But every time it's true, I want to create a new key in my dictionary and record the time at which the value is true.

What would be the best way to approach that? I've thought about looping through the data frame and record the indices where it's true but that seems cumbersome. Any advice would be appreciated.

CodePudding user response:

You can use filtering and groupby.agg:

(df
 .loc[df[1], 0]
 .groupby((~df[1]).cumsum())
 .agg(list)
 .reset_index(drop=True)
 .to_dict()
 )

Output:

{0: ['2018-01-01 01:30:00', '2018-01-01 02:00:00', '2018-01-01 02:30:00'],
 1: ['2018-01-01 04:00:00', '2018-01-01 04:30:00']}

CodePudding user response:

You can use itertools.groupby() to get consecutive groups, then filter, select, and enumerate:

from itertools import groupby

grouped = groupby(df.itertuples(index=False), key=lambda row: row[1])
dict(enumerate(
    [row[0] for row in g] for k, g in grouped if k
    ))
{0: ['2018-01-01 01:30:00', '2018-01-01 02:00:00', '2018-01-01 02:30:00'],
 1: ['2018-01-01 04:00:00', '2018-01-01 04:30:00']}

This should be performant since all these components are lazy until the last step.

  • Related