For example I have created this data frame:
import pandas as pd
df = pd.DataFrame({'Cycle': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]})
#Maybe something like this: df['Cycle Type'] = df['Cycle'].rolling(2).apply(lambda x: len(set(x)) != len(x),raw= True).replace({0 : False, 1: True})
I want to count the amount of values and than assign a type of cycle to it. If the cycle has less than 12 rows or more than 100 rows mark it as bad, else mark it as good. I was thinking of using something like that lambda function to check if the value from the row before was the same, but I'm not sure how to add the count feature to give it the parameters I want.
CodePudding user response:
Start by counting the number of rows in each group with pandas.DataFrame.groupby
, pandas.DataFrame.transform
, and pandas.DataFrame.count
as
df["cycle_quality"] = df.groupby("Cycle")["Cycle"].transform("count")
Then apply the quality function to it using pandas.DataFrame.apply
:
• If number of rows is less than 12 and more than 100, define cycle_quality
as bad
• Else, cycle_quality
should be good
df["cycle_quality"] = df.apply(lambda x: "bad" if x["cycle_quality"] < 12 or x["cycle_quality"] > 100 else "good", axis=1)
[Out]:
Cycle cycle_quality
0 0 good
1 0 good
2 0 good
3 0 good
4 0 good
.. ... ...
71 5 bad
72 5 bad
73 5 bad
74 5 bad
75 5 bad
CodePudding user response:
Another way to achieve this:
- Use
pd.Series.value_counts
to get a count for all unique values indf['Cycle']
. - Next, apply
pd.Series.between
to obtain a series with booleans. - This series we turn into
'good'|'bad'
withreplace
, before passing it topd.Series.map
applied to columnCycle
.
import pandas as pd
df = pd.DataFrame({'Cycle': [0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5]})
vc = df.Cycle.value_counts()
df['Cycle_Type'] = df['Cycle'].map(
vc.between(12,100,inclusive='both').replace({True: 'good', False: 'bad'}))
# printing output per value
print(df.groupby('Cycle', as_index=False).first())
Cycle Cycle_Type
0 0 good
1 1 bad
2 2 good
3 3 good
4 4 good
5 5 bad
CodePudding user response:
Use a groupby
, transform
to get size of each cycle and use between
to see if the size of each cycle falls between 13, 100 (both inclusive) and mark the True as good and False as bad. Because as per requirement any size that is less than 12 and greater than 100 is bad and everything else that is in between [13, 100] is good.
df['Cycle_Type'] = df.groupby('Cycle')['Cycle'].transform('size').between(13, 100,
inclusive='both').replace({True: 'good', False: 'bad'})
output:
Cycle Cycle_Type
0 0 bad
1 0 bad
2 0 bad
3 0 bad
4 0 bad
.. ... ...
71 5 bad
72 5 bad
73 5 bad
74 5 bad
75 5 bad
Edit:
You can change the interval in which you want good or bad as you wish. If your requirement is that less than 12 should be marked good then include 12 in the interval like:
df['Cycle_Type'] = df.groupby('Cycle')['Cycle'].transform('size').between(12, 100,
inclusive='both').replace({True: 'good', False: 'bad'})
Then your output is:
Cycle Cycle_Type
0 0 good
1 0 good
2 0 good
3 0 good
4 0 good
.. ... ...
71 5 bad
72 5 bad
73 5 bad
74 5 bad
75 5 bad
CodePudding user response:
Here is a way using pd.cut()
. This could be useful if more categories than good and bad need to be applied.
(df['Cycle']
.map(
pd.cut(df['Cycle'].value_counts(),
bins = [0,12,100,np.inf],
right = False,
labels = ['bad','good','bad'],
ordered=False)))
Output:
0 good
1 good
2 good
3 good
4 good
...
71 bad
72 bad
73 bad
74 bad
75 bad