Here is a dataframe
.
df = pd.DataFrame({'GDP per Capita(nominal)' : [180366.7, 116935.6004,104861.8511,94277.96536,85535.3832,71809.25058,67335.29341,64800.05733,57410.16586,48472.54454,46949.28309],
'Deaths to Cases(%)' : [1.738361815,0.2561113616,0,1.07778077,0.07438664827,0.2628120894,1.36447959,0.4580663137,1.344920688,0.9312013661,1.285648031]})
IN [1] : print(df)
OUT [1] :
GDP per Capita(nominal) Deaths to Cases(%)
0 180366.70000 1.738362
1 116935.60040 0.256111
2 104861.85110 0.000000
3 94277.96536 1.077781
4 85535.38320 0.074387
5 71809.25058 0.262812
6 67335.29341 1.364480
7 64800.05733 0.458066
8 57410.16586 1.344921
9 48472.54454 0.931201
10 46949.28309 1.285648
Assume the dataframe
is a csv
file.
How would I, using pandas, append the corresponding values from the 'GDP per Capita' column to a list, if the 'Deaths to Cases' value was greater than 0.5
For example, the expected output for the snippet would be:
ListA = [180366.7, 94277.96536, 67335.29341, 57410.16586, 48472.54454, 46949.28309]
Thanks in advance
CodePudding user response:
I did some simple example:
Build a sample df:
import pandas as pd
a = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
b = [0.1, 0.2, 0.3, 0.6, 0.7, 1, 0.01, 0.5]
df = pd.DataFrame(a, columns=['a'])
df['b'] = b
First way
larger_than_half = []
for c, d in zip(list(df['a']), list(df['b'])):
if d > 0.5:
larger_than_half.append(c)
print(larger_than_half)
Second way
tmp = df[df['b'] > 0.5]
print(list(tmp['a']))
CodePudding user response:
Try this:
ListA = list(df.loc[df['Deaths to Cases(%)'] > 0.5]["GDP per Capita(nominal)"])
CodePudding user response:
You can use .loc()
like below:
>>> list_a = list(df.loc[df['Deaths to Cases(%)'] > 0.5 , 'GDP per Capita(nominal)'])
>>> list_a
[180366.7, 94277.96536, 67335.29341, 57410.16586, 48472.54454, 46949.28309]
CodePudding user response:
you can use
pandas.Series.to_list()
import pandas as pd
cols = ['GDP', 'Deaths']
data = [(180366.7, 1.738361815),
(116935.6004, 0.2561113616),
(104861.8511, 0),
(94277.96536, 1.077978077),
(85535.3832, 0.07438664827),
(71809.25058, 0.2628120894),
(67335.29341, 1.386447959),
(64800.05733, 0.4580663137),
(57410.16586, 1.344920668),
(48472.54454, 0.9312013661),
(46949.28309, 1.285648031)]
df = pd.DataFrame(data, columns=cols)
result = df[df.Deaths > 0.5].GDP.to_list()
print(result)
[180366.7, 94277.96536, 67335.29341, 57410.16586, 48472.54454, 46949.28309]
CodePudding user response:
Use .loc
tolist()
to use all Pandas built-in functions for better performance/optimization.
You can use .loc
to locate the relevant entries, then use tolist()
to convert the selected series values into a list, as follows:
ListA = df.loc[df['Deaths to Cases(%)'] > 0.5, 'GDP per Capita(nominal)'].tolist()
Result:
print(ListA)
[180366.7, 94277.96536, 67335.29341, 57410.16586, 48472.54454, 46949.28309]