I have a dataframe with two columns and each column has 5 values. I want to combine all values from the two columns, print them, and count how many times each value appeared.
For example:
Column 1 - Fruits 1
- has values these values [Apple, Orange, Banana, Grapes, Mango]
Column 2 - Fruits 2
- has values these values [Apricot, Avocado, Blackberries, Grapes, Mango]
Now I want to combine values from both the columns and print all values and also want count how many times each appeared
Expected result:
[Apple, Orange, Banana, Grapes, Mango, Apricot, Avocado, Blackberries, Grapes, Mango]
Print count against each value like this:
Apple - 1
Orange - 1
Banana - 1
Grapes - 2
Mango - 2
Apricot - 1
Avocado - 1
Blackberries - 1
CodePudding user response:
Use unstack
and value_counts
:
out = df.unstack().value_counts(sort=False)
print(out)
print(out.index.tolist())
# Output 1:
Apple 1
Orange 1
Banana 1
Grapes 2
Mango 2
Apricot 1
Avocado 1
Blackberries 1
dtype: int64
# Output 2:
['Apple', 'Orange', 'Banana', 'Grapes', 'Mango',
'Apricot', 'Avocado', 'Blackberries']
CodePudding user response:
Assuming your dataframe looks like this:
Fruits 1 Fruits 2
0 Apple Apricot
1 Orange Avocado
2 Banana Blackberries
3 Grapes Grapes
4 Mango Mango
You can join the columns by flattening the underlying numpy array:
>>> vals = df.values.flatten().tolist()
>>> vals
['Apple',
'Apricot',
'Orange',
'Avocado',
'Banana',
'Blackberries',
'Grapes',
'Grapes',
'Mango',
'Mango']
You can print the counts of the values by concatenating the columns together and using value_counts()
on the resulting Series:
>>> concated = pd.concat([df['Fruits 1'], df['Fruits 2']])
>>> concated.value_counts()
Grapes 2
Mango 2
Apple 1
Orange 1
Banana 1
Apricot 1
Avocado 1
Blackberries 1
dtype: int64
>>> concated.value_counts()['Mango']
2
CodePudding user response:
Let's say you have two DataFrames given by:
data1=pd.DataFrame(['Apple', 'Orange', 'Banana', 'Grapes', 'Mango'],columns=['item'])
data2=pd.DataFrame(['Apricot', 'Avocado', 'Blackberries', 'Grapes', 'Mango'],columns=['item'])
Then for getting the result you asked for, the following command can be used:
data1.append(data2).groupby('item').size()
This woudl give you a dataframe with two columns; item
and their corresponding occurrences
item
Apple 1
Apricot 1
Avocado 1
Banana 1
Blackberries 1
Grapes 2
Mango 2
Orange 1
dtype: int64
CodePudding user response:
Here is a nother way to do it with numpy Probably not the most elegant Solution but it works.
import pandas as pd
import numpy as np
df = pd.DataFrame( {"c1": ["Apple", "Orange", "Banana", "Grapes", "Mango"],
"c2": ["Apricot", "Avocado", "Blackberries", "Grapes", "Mango"]})
np_array = df.to_numpy()
unique, counts = np.unique(df, return_counts=True)
print(np.asarray((unique, counts)).T)
Output:
[['Apple' 1]
['Apricot' 1]
['Avocado' 1]
['Banana' 1]
['Blackberries' 1]
['Grapes' 2]
['Mango' 2]
['Orange' 1]]