How to check if a range of values meet criteria-CodePudding

I have a data like I show in the attached picture. What I want is if the next value is same or equal to the current value 1, then I will put them in the same group. The group need to be different between customers.

for example:

(dd-mm-yyyy)

Customer A1 1/1/2020 - Group A
Customer A1 1/1/2020 - Group A
Customer A1 2/1/2020 - Group A
Customer A1 3/1/2020 - Group A
Customer A1 3/1/2020 - Group A
Customer A1 5/1/2020 - Group B
Customer A1 10/1/2020 - Group C
Customer A1 13/1/2020 - Group D
Customer A1 14/1/2020 - Group D
Customer A1 20/1/2020 - Group E

Customer B1 21/1/2020 - Group A
Customer B1 22/1/2020 - Group A
Customer B1 24/1/2020 - Group B
Customer B1 27/1/2020 - Group C
Customer B1 28/1/2020 - Group C

How can I resolve it in pandas?

CodePudding user response：

First ensure that your date column is of datetime type, then perform a diff to get a timedelta and apply a cumsum on the deltas that are above 1 day. To get letters, you can map the obtained number to chr, but keep in mind that you'll have a limited number of groups (or you need to write a slightly more complex mapping to generate AA after Z: example here)

I provided both a numerical and letter grouping.

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df['group'] = df['date'].diff().gt(pd.to_timedelta('1d')).cumsum()
df['group2'] = 'Group ' df['group'].map(lambda x: chr(65 x))

Output:

        date  group   group2
0 2020-01-01      0  Group A
1 2020-01-02      0  Group A
2 2020-01-03      0  Group A
3 2020-01-03      0  Group A
4 2020-01-05      1  Group B
5 2020-01-10      2  Group C
6 2020-01-13      3  Group D
7 2020-01-14      3  Group D
8 2020-01-20      4  Group E

same process per group

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df['group'] = (df.groupby('customer')['date'].diff()
                 .gt(pd.to_timedelta('1d'))
                 .groupby(df['customer']).cumsum()
              )
df['group2'] = 'Group ' df['group'].map(lambda x: chr(65 x))

output:

       customer       date  group   group2
0   Customer A1 2020-01-01      0  Group A
1   Customer A1 2020-01-01      0  Group A
2   Customer A1 2020-01-02      0  Group A
3   Customer A1 2020-01-03      0  Group A
4   Customer A1 2020-01-03      0  Group A
5   Customer A1 2020-01-05      1  Group B
6   Customer A1 2020-01-10      2  Group C
7   Customer A1 2020-01-13      3  Group D
8   Customer A1 2020-01-14      3  Group D
9   Customer A1 2020-01-20      4  Group E
10  Customer B1 2020-01-21      0  Group A
11  Customer B1 2020-01-22      0  Group A
12  Customer B1 2020-01-24      1  Group B
13  Customer B1 2020-01-27      2  Group C
14  Customer B1 2020-01-28      2  Group C

CodePudding user response：

Here is one of the approach:

import pandas as pd
from datetime import datetime
date_format = "%d/%m/%Y"

df = pd.DataFrame({
    'Date': ['01/01/2021', '01/01/2021', '02/01/2021', '04/01/2021', '05/01/2021'],
})

df['Group'] = ''
start_date = df['Date'][0]
group = ord('A')
for i in range(len(df)):
    date = df['Date'].values[i]
    current_date = datetime.strptime(date , date_format)
    previous_date = datetime.strptime(start_date, date_format)
    if (current_date - previous_date).days > 1:
        group  = 1
    df['Group'].values[i] = chr(group)
    start_date = date
    
print (df)

Output:

Date Group
0  01/01/2021     A
1  01/01/2021     A
2  02/01/2021     A
3  04/01/2021     B
4  05/01/2021     B