Home > database >  Unique count of dataframe column values that excludes commas?
Unique count of dataframe column values that excludes commas?

Time:07-20

Imagine my dataframe is this:

                          
     Index     |    A    |    B
 --------------|---------|---------- 
               |         |
1              |Y,Z      |    nan
               |         |
2              |Y,Z,X    |     2
               |         |
3              |Z        |     7
               |         |
4              |nan      |     1
                          

This dataframe is 10,000 lines long and contain many variations in Column A.

From Column A, how would I get the all unique returns by ignoring Commas (,)?

From the 3 line data frame above, it would be:

Z = 3
Y = 2
X = 1

I'm not looking for:

Y,Z = 1
Y,Z,X = 1
X = 1

Thank you!

CodePudding user response:

You can try

out = df['A'].str.split(',').explode().value_counts()
print(out)

Z    3
Y    2
X    1
Name: A, dtype: int64

CodePudding user response:

This?

# Set up problem
df = pd.DataFrame({'A': [['Y', 'Z'], ['Y','Z','X'], 'Z', np.NaN], 'B': [np.NaN,2,7,1]})

df.A.explode().value_counts()

Output

Z    3
Y    2
X    1

CodePudding user response:

df.A.str.split(",").explode().value_counts()

CodePudding user response:

Assuming strings, use str.split, combined with explode and value_counts:

df['A'].str.split(',').explode().value_counts()

output:

Z    3
Y    2
X    1
Name: A, dtype: int64

used input:

df = pd.DataFrame({'A': ['Y,Z', 'Y,Z,X', 'Z', pd.NA], 'B': [pd.NA,2,7,1]})
  • Related