Home > Blockchain >  How to get count of items in rows as well as check one item is present or not, and finally keep the
How to get count of items in rows as well as check one item is present or not, and finally keep the

Time:11-12

Suppose I have a data frame as follows:

 df = pd.DataFrame({
        'Column A': [12,12,12, 13, 15, 16, 141, 141, 141, 141],
         'Column B':['Apple' ,'Apple' ,'Orange' ,'Apple' , np.nan, 'Orange', 'Apple', np.nan, 'Apple', 'Apple']}) 

Based on these conditions:

  • If values in column A are repeated then count the word 'Orange' in Column B and paste it in new Column C(For example, there are 3 rows for 12, the count of 'Orange' is 1, and this 1 should be in new column C). For the non-repeat rows, just paste the corresponding values.

  • If values in column A are repeated then count the word 'Apple' in Column B and paste it in new Column D(For example, there are 3 rows for 12, the count of 'Apple' is 2, and this 2 should be in new column D). For the non-repeat rows, just paste the corresponding values.

  • For repeated and non-repeated rows due to Column A, If the word 'Orange' is present in Column B, write 'yes' else 'No' in Column E.

I would like to have an output following. I was trying in python jupyter notebook, can anyone please help me to get an output like this:

      | Column A | Column B |Column C |Column D |Column E 
----- | -------- | ---------|---------|---------|---------
 0    | 12       | Apple    |1        |2        |Yes   
 1    | 13       | Apple    |0        |1        |No 
 2    | 15       | NaN      |NaN      |NaN      |NaN     
 3    | 16       | Orange   |1        |0        |Yes      
 4    | 141      | Apple    |0        |3        |No   

Thanks in advance:)

CodePudding user response:

I think there is no powerful and simple solution for your question, but use the following code.

First, define a function count(x, a) which returns nan if x includes nan, the number of occurence of a in x, otherwise. The function will be used for the apply function.

Then, use groupby and apply list function.

temp = df.copy().groupby('Column A')['Column B'].apply(list)

After that, temp becomes

Column A
12         [Apple, Apple, Orange]
13                        [Apple]
15                          [nan]
16                       [Orange]
141    [Apple, nan, Apple, Apple]
Name: Column B, dtype: object

So, based on the temp, we can count the number of apples and oranges.

Since df has duplicates, I removed them and add new columns (Column C, D, and E).

df.drop_duplicates(subset = ['Column A'], keep = "first", inplace = True)
df['Column C'] = temp.apply(count, a = "Orange").values
df['Column D'] = temp.apply(count, a = "Apple").values
df['Column E'] = df['Column D'].apply(lambda x:1 if x>=1 else 0)

Edit

I am sorry. I missed the function count..

def count(x, a):
    if type(x[0]) == float:
        return np.nan
    else:
        return x.count(a)
  • Related