How would I add particular values in one column to a list based on the corresponding value in an alt-CodePudding

Here is a dataframe.

df = pd.DataFrame({'GDP per Capita(nominal)' : [180366.7, 116935.6004,104861.8511,94277.96536,85535.3832,71809.25058,67335.29341,64800.05733,57410.16586,48472.54454,46949.28309],
                   'Deaths to Cases(%)' : [1.738361815,0.2561113616,0,1.07778077,0.07438664827,0.2628120894,1.36447959,0.4580663137,1.344920688,0.9312013661,1.285648031]})

IN  [1] : print(df)
OUT [1] :
    GDP per Capita(nominal) Deaths to Cases(%)
0   180366.70000            1.738362
1   116935.60040            0.256111
2   104861.85110            0.000000
3   94277.96536             1.077781
4   85535.38320             0.074387
5   71809.25058             0.262812
6   67335.29341             1.364480
7   64800.05733             0.458066
8   57410.16586             1.344921
9   48472.54454             0.931201
10  46949.28309             1.285648

Assume the dataframe is a csv file.

How would I, using pandas, append the corresponding values from the 'GDP per Capita' column to a list, if the 'Deaths to Cases' value was greater than 0.5

For example, the expected output for the snippet would be:

ListA = [180366.7, 94277.96536, 67335.29341, 57410.16586, 48472.54454, 46949.28309]

Thanks in advance

CodePudding user response：

I did some simple example:

Build a sample df:

import pandas as pd

a = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
b = [0.1, 0.2, 0.3, 0.6, 0.7, 1, 0.01, 0.5]

df = pd.DataFrame(a, columns=['a'])
df['b'] = b

First way

larger_than_half = []

for c, d in zip(list(df['a']), list(df['b'])):
    if d > 0.5:
        larger_than_half.append(c)

print(larger_than_half)

Second way

tmp = df[df['b'] > 0.5]
print(list(tmp['a']))

CodePudding user response：

Try this:

ListA =  list(df.loc[df['Deaths to Cases(%)'] > 0.5]["GDP per Capita(nominal)"])

CodePudding user response：

You can use .loc() like below:

>>> list_a =  list(df.loc[df['Deaths to Cases(%)'] > 0.5 , 'GDP per Capita(nominal)'])
>>> list_a
[180366.7, 94277.96536, 67335.29341, 57410.16586, 48472.54454, 46949.28309]

CodePudding user response：

you can use

pandas.Series.to_list()

import pandas as pd

cols = ['GDP', 'Deaths']

data = [(180366.7, 1.738361815),
        (116935.6004, 0.2561113616),
        (104861.8511, 0),
        (94277.96536, 1.077978077),
        (85535.3832, 0.07438664827),
        (71809.25058, 0.2628120894),
        (67335.29341, 1.386447959),
        (64800.05733, 0.4580663137),
        (57410.16586, 1.344920668),
        (48472.54454, 0.9312013661),
        (46949.28309, 1.285648031)]

df = pd.DataFrame(data, columns=cols)

result = df[df.Deaths > 0.5].GDP.to_list()

print(result)

[180366.7, 94277.96536, 67335.29341, 57410.16586, 48472.54454, 46949.28309]

CodePudding user response：

Use .loc tolist()to use all Pandas built-in functions for better performance/optimization.

You can use .loc to locate the relevant entries, then use tolist() to convert the selected series values into a list, as follows:

ListA = df.loc[df['Deaths to Cases(%)'] > 0.5, 'GDP per Capita(nominal)'].tolist()

Result:

print(ListA)

[180366.7, 94277.96536, 67335.29341, 57410.16586, 48472.54454, 46949.28309]