I have a dataframe with different columns
COUNTRY | COUNTRY1 | COUNTRY2 | hp |
---|---|---|---|
Italy | uk | france | 50 |
uk | france | italy | 150 |
I would like to count the countries in different columns by row(without using melt) to get and output like this:
Italy 2
France 2
Uk 2
I tried different options as loc
and groupby
but it doesn't work.
Thank you
CodePudding user response:
the suggestion of @Henry Yik is way more efficient, and in combination with a groupby can solve your problem. also, try this: def count_entries(df, col_name):
country_count = {}
# Extract column from DataFrame: col
col = df[col_name]
# Iterate over country column in DataFrame
for entry in col:
# If the country is in langs_count, add 1
if entry in country_count.keys():
country_count[entry] = country_count[entry] 1
# Else add the language to langs_count, set the value to 1
else:
country_count[entry] = 1
return country_count
# Call count_entries(): result
for col_name in df.columns():
result = count_entries(df, co_name)
CodePudding user response:
You can filter the COUNTRY*
columns by .filter()
, then .stack()
to a single column with the country names.
As your country names have mixed character letter cases, you can change country names to Title case (first letter capital letter) by str.title()
before we count the occurences with .value_counts()
, as follows:
df.filter(like='COUNTRY').stack().str.title().value_counts()
Result:
Italy 2
Uk 2
France 2
dtype: int64