I have a list of data frames dataframes
a list of names keeplist
and a dict Hydrocap
.
I am trying to loop through the columns of each data frame based on the column names keeplist
while applying a where function in the column loop to replace the value in the column to that in the dictionary value (for its respective key) if it is greater than the dictionary value. The issue is I run into a TypeError: '>=' not supported between instances of 'str' and 'int'
where I am not sure how to solve the issue.
keeplist = ['BOUND','GCOUL','CHIEF','ROCKY','WANAP','PRIRA','LGRAN','LMONU','ICEHA','MCNAR','DALLE']
HydroCap = {'BOUND':55000,'GCOUL':280000,'CHIEF':219000,'ROCKY':220000,'WANAP':161000,'PRIRA':162000,'LGRAN':130000,'LMONU':130000,'ICEHA':106000,'MCNAR':232000,'DALLE':375000}
for i in dataframes:
for c in i[keeplist]:
c = np.where(c >= HydroCap[c], HydroCap[c], c)
Any push in the right direction would be greatly appreciated. I think the issue is that it is expecting an index value in place for HydroCap[1]
instead of HydroCap[c]
but, that is a hunch.
first 7 columns of dataframe[0]
Week Month Day Year BOUND GCOUL CHIEF \
0 1 8 5 1979 44999.896673 161241.036388 166497.578098
1 2 8 12 1979 15309.259762 58219.122747 63413.204052
2 3 8 19 1979 15316.965781 56072.024363 60606.956215
3 4 8 26 1979 14371.269016 58574.003087 63311.569888
CodePudding user response:
import pandas as pd
import numpy as np
# Since I don't have all of the dataframes, I just use the sample you shared
df = pd.read_csv('dataframe.tsv', sep = "\t")
# Note, I've changed some values so you can see something actually happens
keeplist = ['BOUND','GCOUL','CHIEF']
HydroCap = {'BOUND':5500,'GCOUL':280000,'CHIEF':21900}
# The inside of the loop has been changed to accomplish the actual goal
# First, there are now two variables inside the loop: col, and c
# col is the column
# c represents a single element in that column at a time
# The code operates over a column at a time,
# using a list comprehension to cycle over each element
# and replace the full column with the new values at once
for col in df[keeplist]:
df[col] = [np.where(c >= HydroCap[col], HydroCap[col], c) for c in df[col]]
Which produces:
df
Week | Month | Day | Year | BOUND | GCOUL | CHIEF | |
---|---|---|---|---|---|---|---|
0 | 1 | 8 | 5 | 1979 | 5500.0 | 161241.036388 | 21900.0 |
1 | 2 | 8 | 12 | 1979 | 5500.0 | 58219.122747 | 21900.0 |
2 | 3 | 8 | 19 | 1979 | 5500.0 | 56072.024363 | 21900.0 |
3 | 4 | 8 | 26 | 1979 | 5500.0 | 58574.003087 | 21900.0 |
In order to replace elements in a dataframe, you either need to go a whole column at a time, or reassign values to a cell specified by row and column coordinates. Reassigning the c
variable in your original code—assuming it represented the cell values you had in mind, and not the column name as was the case—doesn't alter anything in the dataframe.