I'm trying to find the maximum number of occurrences of '/' (a slash) in a cell in a column in a CSV file. Here's the table below. It has hundreds of rows.
Person Full Name | Person CRD Number | ID Number |
---|---|---|
Jack Johnson | 32 / 54 / 57 / 87 | 5686 |
John Johnsen | 11 / 22 | 6589 |
Luke Peterson | 34 | 6978 |
Kyle Garcia | 63 / 24 / 83 | 8957 |
Here's my code:
import pandas as pd
data = '/Users/myname/Downloads/tabularShell.csv'
df = pd.read_csv(data, index_col=0)
df1 = pd.DataFrame(df)['Person CRD Number']
df2 = df1.value_counts('/')
print(df2)
The output should be 3 because the maximum number of occurrences of '/' is 3 in a cell in the "Person CRD Number" column in the table shown above.
Thank you!
CodePudding user response:
You can use .str.count
for this. For each item in the column, it returns how many of the specified character are in that colum. .max()
will then select the largest value.
>>> df['Person CRD Number'].str.count('/').max()
3
CodePudding user response:
print(max(df['Person CRD Number'].str.count('/')))
output:
>>> 3
CodePudding user response:
This gives you better control if you want to do more operations on your count
import pandas as pd
# Ignore these lines, it is just to build the dataframe
data = [['Jack Johnson', '32 / 54 / 57 / 87', '5686'],
['John Johnsen', '11 / 22', '6589'],
['Luke Peterson', '34', '6978']]
df = pd.DataFrame(data)
df.columns = ['Person Full Name', 'Person CRD Number', 'ID Number']
# Define a small function to count the char in a string
def count_char(string, char=r'/'):
return string.count(char)
# Apply the function to the CRD number and store in a new column
df['count'] = df['Person CRD Number'].apply(count_char)
# Get the maximum from the count
print(df['count'].max())