This is my data frame (labeled unp):
I want to change the row GDP_Growth which is currently blank to have the value of: unp.GDP_CAP - unp.GDP_CAP.shift(1)
If it fulfils the condition that the 'TIME' is not 2014 or >2014, else it should be N/A
Tried using the if function directly but it's not working:
if unp.loc[unp['TIME'] > 2014]:
unp['GDP_Growth'] = unp.GDP_CAP - unp.GDP_CAP.shift(1)
else:
return
CodePudding user response:
You should avoid the if
statement when using dataframes as it will be slower (less efficient).
In place, depending on what you need, you can use np.where()
.
because the dataframe in the question is a picture (as opposed to text), i give you the standard implementation, which looks like this:
import pandas as pd
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9]})
# Use np.where() to select values from column 'A' where column 'B' is greater than 7
result = np.where(df['B'] > 7, df['A'], 0)
# Print the result
print(result)
The result of the above is this:
[0, 0, 0, 4, 5]
You will need to modify the above for your particular dataframe.
CodePudding user response:
The question in title is currently Python: How do I use the if function when calling out a specific row?, which my answer will not apply to. Instead, we will compute the derivate / 'growth' and selectively apply it.
Explanation: In Python, you generally want to use a functional programming style to keep most computations outside of the Python interpreter and instead work with C-implemented functions.
Solution:
A. Obtain the derivate/'growth'
For your dataframe df = pd.DataFrame(...)
you can obtain the change in value for a specific column with df['column_name'].diff()
, e.g.
# This is your dataframe
In : df
Out:
gdp growth year
0 0 <NA> 2000
1 1 <NA> 2001
2 2 <NA> 2002
3 3 <NA> 2003
4 4 <NA> 2004
In : df['gdp'].diff()
Out:
0 NaN
1 1.0
2 1.0
3 1.0
4 1.0
Name: year, dtype: float64
B. Apply it to the 'growth' column
In :df['growth'] = df['gdp'].diff()
df
Out:
gdp growth year
0 0 NaN 2000
1 1 1.0 2001
2 2 1.0 2002
3 3 1.0 2003
4 4 1.0 2004
C. Selectively exclude values If you then want specific years to have a certain value, apply them selectively
In : df['growth'].iloc[np.where(df['year']<2003)] = np.nan
df
Out:
gdp growth year
0 0 NaN 2000
1 1 NaN 2001
2 2 NaN 2002
3 3 1.0 2003
4 4 1.0 2004