combine values from two columns in a dataframe and get the count of each-CodePudding

I have a dataframe with two columns and each column has 5 values. I want to combine all values from the two columns, print them, and count how many times each value appeared.

For example:

Column 1 - Fruits 1 - has values these values [Apple, Orange, Banana, Grapes, Mango]

Column 2 - Fruits 2 - has values these values [Apricot, Avocado, Blackberries, Grapes, Mango]

Now I want to combine values from both the columns and print all values and also want count how many times each appeared

Expected result:

[Apple, Orange, Banana, Grapes, Mango, Apricot, Avocado, Blackberries, Grapes, Mango]

Print count against each value like this:

Apple - 1
Orange - 1
Banana - 1
Grapes - 2
Mango - 2
Apricot - 1
Avocado - 1
Blackberries - 1

CodePudding user response：

Use unstack and value_counts:

out = df.unstack().value_counts(sort=False)
print(out)
print(out.index.tolist())

# Output 1:
Apple           1
Orange          1
Banana          1
Grapes          2
Mango           2
Apricot         1
Avocado         1
Blackberries    1
dtype: int64


# Output 2:
['Apple', 'Orange', 'Banana', 'Grapes', 'Mango',
 'Apricot', 'Avocado', 'Blackberries']

CodePudding user response：

Assuming your dataframe looks like this:

  Fruits 1      Fruits 2
0    Apple       Apricot
1   Orange       Avocado
2   Banana  Blackberries
3   Grapes        Grapes
4    Mango         Mango

You can join the columns by flattening the underlying numpy array:

>>> vals = df.values.flatten().tolist()
>>> vals
['Apple',
 'Apricot',
 'Orange',
 'Avocado',
 'Banana',
 'Blackberries',
 'Grapes',
 'Grapes',
 'Mango',
 'Mango']

You can print the counts of the values by concatenating the columns together and using value_counts() on the resulting Series:

>>> concated = pd.concat([df['Fruits 1'], df['Fruits 2']])
>>> concated.value_counts()
Grapes          2
Mango           2
Apple           1
Orange          1
Banana          1
Apricot         1
Avocado         1
Blackberries    1
dtype: int64

>>> concated.value_counts()['Mango']
2

CodePudding user response：

Let's say you have two DataFrames given by:

data1=pd.DataFrame(['Apple', 'Orange', 'Banana', 'Grapes', 'Mango'],columns=['item'])
data2=pd.DataFrame(['Apricot', 'Avocado', 'Blackberries', 'Grapes', 'Mango'],columns=['item'])

Then for getting the result you asked for, the following command can be used: data1.append(data2).groupby('item').size()

This woudl give you a dataframe with two columns; item and their corresponding occurrences

item
Apple           1
Apricot         1
Avocado         1
Banana          1
Blackberries    1
Grapes          2
Mango           2
Orange          1
dtype: int64

CodePudding user response：

Here is a nother way to do it with numpy Probably not the most elegant Solution but it works.

import pandas as pd
import numpy as np

df = pd.DataFrame( {"c1":  ["Apple", "Orange", "Banana", "Grapes", "Mango"],
                    "c2": ["Apricot", "Avocado", "Blackberries", "Grapes", "Mango"]})

np_array = df.to_numpy()

unique, counts = np.unique(df, return_counts=True)
print(np.asarray((unique, counts)).T)

Output:

[['Apple' 1]
 ['Apricot' 1]
 ['Avocado' 1]
 ['Banana' 1]
 ['Blackberries' 1]
 ['Grapes' 2]
 ['Mango' 2]
 ['Orange' 1]]