Python Pandas: Counting the amount of subsequent value and assign a name if conditions are met-CodePudding

For example I have created this data frame:

import pandas as pd

df = pd.DataFrame({'Cycle': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
                             2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4,
                             4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]})


#Maybe something like this: df['Cycle Type'] = df['Cycle'].rolling(2).apply(lambda x: len(set(x)) != len(x),raw= True).replace({0 : False, 1: True})

I want to count the amount of values and than assign a type of cycle to it. If the cycle has less than 12 rows or more than 100 rows mark it as bad, else mark it as good. I was thinking of using something like that lambda function to check if the value from the row before was the same, but I'm not sure how to add the count feature to give it the parameters I want.

CodePudding user response：

Start by counting the number of rows in each group with pandas.DataFrame.groupby, pandas.DataFrame.transform, and pandas.DataFrame.count as

df["cycle_quality"] = df.groupby("Cycle")["Cycle"].transform("count")

Then apply the quality function to it using pandas.DataFrame.apply:

• If number of rows is less than 12 and more than 100, define cycle_quality as bad

• Else, cycle_quality should be good

df["cycle_quality"] = df.apply(lambda x: "bad" if x["cycle_quality"] < 12 or x["cycle_quality"] > 100 else "good", axis=1)

[Out]:
    Cycle cycle_quality
0       0          good
1       0          good
2       0          good
3       0          good
4       0          good
..    ...           ...
71      5           bad
72      5           bad
73      5           bad
74      5           bad
75      5           bad

CodePudding user response：

Another way to achieve this:

Use pd.Series.value_counts to get a count for all unique values in df['Cycle'].
Next, apply pd.Series.between to obtain a series with booleans.
This series we turn into 'good'|'bad' with replace, before passing it to pd.Series.map applied to column Cycle.

import pandas as pd

df = pd.DataFrame({'Cycle': [0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5]})

vc = df.Cycle.value_counts()

df['Cycle_Type'] = df['Cycle'].map(
    vc.between(12,100,inclusive='both').replace({True: 'good', False: 'bad'}))

# printing output per value
print(df.groupby('Cycle', as_index=False).first())

   Cycle Cycle_Type
0      0       good
1      1        bad
2      2       good
3      3       good
4      4       good
5      5        bad

CodePudding user response：

Use a groupby, transform to get size of each cycle and use between to see if the size of each cycle falls between 13, 100 (both inclusive) and mark the True as good and False as bad. Because as per requirement any size that is less than 12 and greater than 100 is bad and everything else that is in between [13, 100] is good.

df['Cycle_Type'] = df.groupby('Cycle')['Cycle'].transform('size').between(13, 100,
        inclusive='both').replace({True: 'good', False: 'bad'})

output:

    Cycle Cycle_Type
0       0        bad
1       0        bad
2       0        bad
3       0        bad
4       0        bad
..    ...        ...
71      5        bad
72      5        bad
73      5        bad
74      5        bad
75      5        bad

Edit:

You can change the interval in which you want good or bad as you wish. If your requirement is that less than 12 should be marked good then include 12 in the interval like:

df['Cycle_Type'] = df.groupby('Cycle')['Cycle'].transform('size').between(12, 100,
            inclusive='both').replace({True: 'good', False: 'bad'})

Then your output is:

    Cycle Cycle_Type
0       0       good
1       0       good
2       0       good
3       0       good
4       0       good
..    ...        ...
71      5        bad
72      5        bad
73      5        bad
74      5        bad
75      5        bad

CodePudding user response：

Here is a way using pd.cut(). This could be useful if more categories than good and bad need to be applied.

(df['Cycle']
.map(
    pd.cut(df['Cycle'].value_counts(),
    bins = [0,12,100,np.inf],
    right = False,
    labels = ['bad','good','bad'],
    ordered=False)))

Output:

0     good
1     good
2     good
3     good
4     good
      ... 
71     bad
72     bad
73     bad
74     bad
75     bad