Decision which came to my mind is:
dataset['Name'].loc[dataset['Sex'] == 'female'].value_counts().idxmax()
But here is not such ordinary decision because there are names of female's husband after Mrs and i need to somehowes split it
Input data:
df = pd.DataFrame({'Name': ['Braund, Mr. Owen Harris', 'Cumings, Mrs. John Bradley (Florence Briggs Thayer)', 'Heikkinen, Miss. Laina', 'Futrelle, Mrs. Jacques Heath (Lily May Peel)', 'Allen, Mr. William Henry', 'Moran, Mr. James', 'McCarthy, Mr. Timothy J', 'Palsson, Master. Gosta Leonard', 'Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)', 'Nasser, Mrs. Nicholas (Adele Achem)'],
'Sex': ['male', 'female', 'female', 'female', 'male', 'male', 'male', 'male', 'female', 'female'],
})
Task 4: Name the most popular female name on the ship.
'some code'
Output: Anna #The most popular female name
Task 5: Name the most popular male name on the ship.
'some code'
Output: Wilhelm #The most popular male name
CodePudding user response:
Quick and dirty would be something like:
from collections import Counter
# Random list of names
your_lst = ["Mrs Braun", "Allen, Mr. Timothy J", "Allen, Mr. Henry William"]
# Split names by space, and flatten the list.
your_lst_flat = [item for sublist in [x.split(" ") for x in your_lst ] for item in sublist]
# Count occurrences. With this you will get a count of all the values, including Mr and Mrs. But you can just ignore these.
Counter(your_lst_flat).most_common()
CodePudding user response:
IIUC, you can use a regex to extract either the first name, or if Mrs.
the name after the parentheses:
s = df['Name'].str.extract(r'((?:(?<=Mr. )|(?<=Miss. )|(?<=Master. ))\w |(?<=\()\w )',
expand=False)
s.groupby(df['Sex']).value_counts()
output:
Sex Name
female Adele 1
Elisabeth 1
Florence 1
Laina 1
Lily 1
male Gosta 1
James 1
Owen 1
Timothy 1
William 1
Name: Name, dtype: int64
once you have s
, to get the most frequent female name(s):
s[df['Sex'].eq('female')].mode()