Display values of one column when using conditionals based on two other columns-CodePudding

Country	Total Cases	Total Deaths	Recovered
China	741	147	987
Sweden	381	021	242
Italy	219	100	088

I am using Pandas and I'm trying to think of a function which enumerates the Country where there are current cases, i.e. where Total Cases minus Total Deaths exceeds Recovered. I'm really new at this - just drawing a blank.

I tried this:

def active_countries(data):
    df['TC-TD'] = df['Total Cases'] - df['Total Deaths']
    if df['TC-TD'] > df['Recovered']==True
        print('Country')

ValueError                                Traceback (most recent call last)
Input In [297], in <cell line: 1>()
----> 1 active_countries(latest)

Input In [295], in active_countries(data)
      1 def active_countries(data):
      2     df['TC-TD'] = df['Total Cases'] - df['Total Deaths']
----> 3     if df['TC-TD'] > df['Recovered']==True:
      4         print('Country')

File ~\AppData\Roaming\Python\Python38\site-packages\pandas\core\generic.py:1527, in NDFrame.__nonzero__(self)
   1525 @final
   1526 def __nonzero__(self):
-> 1527     raise ValueError(
   1528         f"The truth value of a {type(self).__name__} is ambiguous. "
   1529         "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1530     )

CodePudding user response：

import pandas as pd

df = pd.read_csv('test.csv')

def check(dframe):
    for index, row in dframe.iterrows():
        if (row['Total Cases']-row['Total Deaths'] > row['Recovered']):
            print(row['Country'])


            
check(df)

With test.csv:

Country,Total Cases,Total Deaths,Recovered
China,741,147,987
Sweden,381,021,242
Italy,219,100,088

CodePudding user response：

print(df[df["Total Cases"]-df["Total Deaths"]>df["Recovered"]])

output:

  Country  Total Cases  Total Deaths  Recovered
1  sweden          381            21        242
2   italy          219           100         88

CodePudding user response：

This can be done through the [] notation. You can read the bracket notation as a conditional on the underlying data which returns a vector of bools. If an entry in this vector is true, it is returned, false it is not.

In this example we take the df['Total Cases'] series and subtract df['Total Deaths'] from it, and compare the resulting series against the df['Recovered']. This creates a bool vector (or Series), that we use the bracket notation with. We then select just the 'Country' column from that.

from io import StringIO
import pandas as pd

input = '''
"Country","Total Cases","Total Deaths","Recovered"
"China",741,147,987
"Sweden",381,21,242
"Italy", 219,   100, 88
'''
df = pd.read_csv(StringIO(input))

print(df[df['Total Cases'] - df['Total Deaths'] > df['Recovered']]['Country'])

Ouput:

1    Sweden
2     Italy
Name: Country, dtype: object

CodePudding user response：

You can fix the posted code with this:

df['TC-TD'] = df['Total Cases'] - df['Total Deaths']
rows = df[df['TC-TD'] > df['Recovered']]
print(list(rows["Country"]))

Output:

['Sweden', 'Italy']

Alternatively, the query function is a powerful way to filter a data frame with complex conditional logic.

import pandas as pd

data = {"Country": ["China", "Sweden", "Italy"],
        "Total_Cases": [741, 381, 219],
        "Total_Deaths": [147, 21, 100],
        "Recovered": [987, 242, 88]}
                        
df = pd.DataFrame(data)

print(df) 
rows = df.query('(Total_Cases - Total_Deaths) > Recovered')
#print(rows)
# if just want the country names
print(list(rows["Country"]))

Output:

Raw list:

  Country  Total_Cases  Total_Deaths  Recovered
0   China          741           147        987
1  Sweden          381            21        242
2   Italy          219           100         88

Results from query:

['Sweden', 'Italy']

If the original data has whitespace in the column names, it's simpler dealing with names with no spaces. Do this to rename the column names.

df = df.rename(columns={'Total Cases': 'Total_Cases', 'Total Deaths': 'Total_Deaths'})