I apologize for the title, but I don't know enough to properly condense my question into a single line.
Here is the use case:
- I have a pd.DataFrame with arbitrary index values and a column, 'timestamp'.
- I have an ordered List of timestamp values.
- I want to split the DataFrame into chunks with 'timestamp' values that are:
- less than List[0] (the lowest value in the List)
- in between each consecutive timestamps in the List (lesser value inclusive, greater value exclusive)
- greater than or equal to List[-1] (the greatest value in the List)
I've already made of list-of-dataframes, chunking the original with a While Statement. However, I feel like this is inefficient and there should be a way to use np.split() or df.groupby() along with the List of timestamps to do this more elegantly and efficiently. Then again, I could be wrong.
So I guess my question boils down to: "what is the most time efficient method, having the most elegant presentation, to achieve the goals stated above"?
@KU99 Mentioned providing an example and the output:
df =
colA | colB | timestamp |
---|---|---|
First | row | 1 |
Second | row | 2 |
First | row | 3 |
Second | row | 4 |
First | row | 5 |
Second | row | 6 |
First | row | 7 |
Second | row | 8 |
First | row | 9 |
Second | row | 10 |
First | row | 11 |
Second | row | 12 |
List = [3, 7, 8, 9]
output =
colA | colB | timestamp |
---|---|---|
First | row | 1 |
Second | row | 2 |
colA | colB | timestamp |
---|---|---|
First | row | 3 |
Second | row | 4 |
First | row | 5 |
Second | row | 6 |
colA | colB | timestamp |
---|---|---|
First | row | 7 |
colA | colB | timestamp |
---|---|---|
Second | row | 8 |
colA | colB | timestamp |
---|---|---|
First | row | 9 |
Second | row | 10 |
First | row | 11 |
Second | row | 12 |
The type of output is going to be dependent on the method, but I don't care if it's a list, a dictionary, or some other indexable type.
CodePudding user response:
Try pd.cut
.groupby
:
bins = [3, 7, 8, 9]
for _, g in df.groupby(
pd.cut(df.timestamp, [float("-inf")] bins [float(" inf")], right=False)
):
print(g)
print("-" * 80)
Prints:
colA colB timestamp
0 First row 1
1 Second row 2
--------------------------------------------------------------------------------
colA colB timestamp
2 First row 3
3 Second row 4
4 First row 5
5 Second row 6
--------------------------------------------------------------------------------
colA colB timestamp
6 First row 7
--------------------------------------------------------------------------------
colA colB timestamp
7 Second row 8
--------------------------------------------------------------------------------
colA colB timestamp
8 First row 9
9 Second row 10
10 First row 11
11 Second row 12
--------------------------------------------------------------------------------