I have a DataFrame df
that I need to split based on whether the value in a specific column ColB
is within a given range;
1-3, 3-5, 5-7 etc
Input:
Time ColA ColB ColC
1 100 1.1 500
2 105 3.2 600
3 107 7.7 550
4 106 2.4 750
5 104 5.2 950
6 103 6.9 450
Desired Output:
Time ColA ColB ColC
1 100 1.1 500
4 106 2.4 750
Time ColA ColB ColC
2 105 3.2 600
Time ColA ColB ColC
3 107 7.7 550
5 104 5.2 950
6 103 6.9 450
Is there a nice way to do this without creating a loop in Python? Also, would it be more efficient to store the output as a list of DataFrames or a Dictionary of Dataframes? I ask as its a fairly large dataset.
CodePudding user response:
Use pandas.cut
https://pandas.pydata.org/docs/reference/api/pandas.cut.html
ie.
groups = pd.cut(df["ColB"], [1,3,5,7])
[d for _, d in df.groupby(groups)]
CodePudding user response:
You can try this:
lst = [(1,3), (3,5), (5,7)]
result = [df[df['ColB'].between(a,b)] for a,b in lst]
for i in result:
print(i, "\n")
Time ColA ColB ColC
0 1 100 1.1 500
3 4 106 2.4 750
Time ColA ColB ColC
1 2 105 3.2 600
Time ColA ColB ColC
4 5 104 5.2 950
5 6 103 6.9 450