Country | Total Cases | Total Deaths | Recovered |
---|---|---|---|
China | 741 | 147 | 987 |
Sweden | 381 | 021 | 242 |
Italy | 219 | 100 | 088 |
I am using Pandas and I'm trying to think of a function which enumerates the Country
where there are current cases, i.e. where Total Cases
minus Total Deaths
exceeds Recovered
. I'm really new at this - just drawing a blank.
I tried this:
def active_countries(data):
df['TC-TD'] = df['Total Cases'] - df['Total Deaths']
if df['TC-TD'] > df['Recovered']==True
print('Country')
ValueError Traceback (most recent call last)
Input In [297], in <cell line: 1>()
----> 1 active_countries(latest)
Input In [295], in active_countries(data)
1 def active_countries(data):
2 df['TC-TD'] = df['Total Cases'] - df['Total Deaths']
----> 3 if df['TC-TD'] > df['Recovered']==True:
4 print('Country')
File ~\AppData\Roaming\Python\Python38\site-packages\pandas\core\generic.py:1527, in NDFrame.__nonzero__(self)
1525 @final
1526 def __nonzero__(self):
-> 1527 raise ValueError(
1528 f"The truth value of a {type(self).__name__} is ambiguous. "
1529 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1530 )
CodePudding user response:
import pandas as pd
df = pd.read_csv('test.csv')
def check(dframe):
for index, row in dframe.iterrows():
if (row['Total Cases']-row['Total Deaths'] > row['Recovered']):
print(row['Country'])
check(df)
With test.csv
:
Country,Total Cases,Total Deaths,Recovered
China,741,147,987
Sweden,381,021,242
Italy,219,100,088
CodePudding user response:
print(df[df["Total Cases"]-df["Total Deaths"]>df["Recovered"]])
output:
Country Total Cases Total Deaths Recovered
1 sweden 381 21 242
2 italy 219 100 88
CodePudding user response:
This can be done through the [] notation. You can read the bracket notation as a conditional on the underlying data which returns a vector of bools. If an entry in this vector is true, it is returned, false it is not.
In this example we take the df['Total Cases'] series and subtract df['Total Deaths'] from it, and compare the resulting series against the df['Recovered']. This creates a bool vector (or Series), that we use the bracket notation with. We then select just the 'Country' column from that.
from io import StringIO
import pandas as pd
input = '''
"Country","Total Cases","Total Deaths","Recovered"
"China",741,147,987
"Sweden",381,21,242
"Italy", 219, 100, 88
'''
df = pd.read_csv(StringIO(input))
print(df[df['Total Cases'] - df['Total Deaths'] > df['Recovered']]['Country'])
Ouput:
1 Sweden
2 Italy
Name: Country, dtype: object
CodePudding user response:
You can fix the posted code with this:
df['TC-TD'] = df['Total Cases'] - df['Total Deaths']
rows = df[df['TC-TD'] > df['Recovered']]
print(list(rows["Country"]))
Output:
['Sweden', 'Italy']
Alternatively, the query function is a powerful way to filter a data frame with complex conditional logic.
import pandas as pd
data = {"Country": ["China", "Sweden", "Italy"],
"Total_Cases": [741, 381, 219],
"Total_Deaths": [147, 21, 100],
"Recovered": [987, 242, 88]}
df = pd.DataFrame(data)
print(df)
rows = df.query('(Total_Cases - Total_Deaths) > Recovered')
#print(rows)
# if just want the country names
print(list(rows["Country"]))
Output:
Raw list:
Country Total_Cases Total_Deaths Recovered
0 China 741 147 987
1 Sweden 381 21 242
2 Italy 219 100 88
Results from query:
['Sweden', 'Italy']
If the original data has whitespace in the column names, it's simpler dealing with names with no spaces. Do this to rename the column names.
df = df.rename(columns={'Total Cases': 'Total_Cases', 'Total Deaths': 'Total_Deaths'})