Imagine my dataframe is this:
Index | A | B
--------------|---------|----------
| |
1 |Y,Z | nan
| |
2 |Y,Z,X | 2
| |
3 |Z | 7
| |
4 |nan | 1
This dataframe is 10,000 lines long and contain many variations in Column A.
From Column A, how would I get the all unique returns by ignoring Commas (,)?
From the 3 line data frame above, it would be:
Z = 3
Y = 2
X = 1
I'm not looking for:
Y,Z = 1
Y,Z,X = 1
X = 1
Thank you!
CodePudding user response:
You can try
out = df['A'].str.split(',').explode().value_counts()
print(out)
Z 3
Y 2
X 1
Name: A, dtype: int64
CodePudding user response:
This?
# Set up problem
df = pd.DataFrame({'A': [['Y', 'Z'], ['Y','Z','X'], 'Z', np.NaN], 'B': [np.NaN,2,7,1]})
df.A.explode().value_counts()
Output
Z 3
Y 2
X 1
CodePudding user response:
df.A.str.split(",").explode().value_counts()
CodePudding user response:
Assuming strings, use str.split
, combined with explode
and value_counts
:
df['A'].str.split(',').explode().value_counts()
output:
Z 3
Y 2
X 1
Name: A, dtype: int64
used input:
df = pd.DataFrame({'A': ['Y,Z', 'Y,Z,X', 'Z', pd.NA], 'B': [pd.NA,2,7,1]})