Let's imagine a DataFrame with some patterns that do intersect.
A sample code for a minimal reproducible example:
import pandas as pd
from random import randint as ri
from random import choice
get_pattern = lambda: [i for ii in [[choice([True,False])]*ri(2,6) for _ in range(0,5)] for i in ii]
patterns = [get_pattern()[:15] for _ in range(0,3)]
df = pd.DataFrame({
'A': patterns[0],
'B': patterns[1],
'C': patterns[2]
}, index = pd.interval_range(start=0, end=15, closed='both')).T.replace(False,'')
Output:
[0, 1] | [1, 2] | [2, 3] | [3, 4] | [4, 5] | [5, 6] | [6, 7] | [7, 8] | [8, 9] | [9, 10] | [10, 11] | [11, 12] | [12, 13] | [13, 14] | [14, 15] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | True | True | True | True | |||||||||||
B | True | True | True | True | True | True | True | True | True | True | True | ||||
C | True | True | True | True | True | True | True | True |
I did random
to produce different results for the question on purpose. I also did a transpose for the table to fit better and use less vertical space, False
s are removed for a better visual representation.
In this particular case one can observe identical columns:
- from 0 to 4
- from 4 to 8
- from 8 to 12
- from 12 to 15
The question is how to collapse those identical columns (or rows without transpose) and change the corresponding intervals.
After some searching the only solution I've come up with was iterating through rows / columns and searching for duplicates. Than assigning left and right interval for a newly created DataFrame.
I suppose there might be a more elegant solution to get this:
intervals = [
pd.Interval(left=0, right=4, closed='both'),
pd.Interval(left=4, right=8, closed='both'),
pd.Interval(left=8, right=12, closed='both'),
pd.Interval(left=12, right=15, closed='both')
]
pd.DataFrame({
'A': [False,False,True,False],
'B': [True,True,False,True],
'C': [False,True,True,False]
}, pd.IntervalIndex(intervals)).T.replace(False,'')
Output
[0, 4] | [4, 8] | [8, 12] | [12, 15] | |
---|---|---|---|---|
A | True | |||
B | True | True | True | |
C | True | True |
CodePudding user response:
One way using shift-groupby trick:
m = df.shift().ne(df).any(axis=1).cumsum()
res = []
for _, d in df.groupby(m):
s = d.iloc[0]
s.name = pd.Interval(d.index.left[0], d.index.right[-1], "both")
res.append(s)
new_df = pd.concat(res, axis=1)
Output:
[0, 4] [4, 8] [8, 12] [12, 15]
A True
B True True True
C True True