Issue with TypeError when looping through columns in a list of data frames-CodePudding

I have a list of data frames dataframes a list of names keeplist and a dict Hydrocap.

I am trying to loop through the columns of each data frame based on the column names keeplist while applying a where function in the column loop to replace the value in the column to that in the dictionary value (for its respective key) if it is greater than the dictionary value. The issue is I run into a TypeError: '>=' not supported between instances of 'str' and 'int' where I am not sure how to solve the issue.

keeplist = ['BOUND','GCOUL','CHIEF','ROCKY','WANAP','PRIRA','LGRAN','LMONU','ICEHA','MCNAR','DALLE']

HydroCap = {'BOUND':55000,'GCOUL':280000,'CHIEF':219000,'ROCKY':220000,'WANAP':161000,'PRIRA':162000,'LGRAN':130000,'LMONU':130000,'ICEHA':106000,'MCNAR':232000,'DALLE':375000}

for i in dataframes:
  for c in i[keeplist]:
    c = np.where(c >= HydroCap[c], HydroCap[c], c)

Any push in the right direction would be greatly appreciated. I think the issue is that it is expecting an index value in place for HydroCap[1] instead of HydroCap[c] but, that is a hunch.

first 7 columns of dataframe[0]

      Week  Month  Day  Year         BOUND          GCOUL          CHIEF  \
0        1      8    5  1979  44999.896673  161241.036388  166497.578098   
1        2      8   12  1979  15309.259762   58219.122747   63413.204052   
2        3      8   19  1979  15316.965781   56072.024363   60606.956215   
3        4      8   26  1979  14371.269016   58574.003087   63311.569888

CodePudding user response：

import pandas as pd
import numpy as np

# Since I don't have all of the dataframes, I just use the sample you shared
df = pd.read_csv('dataframe.tsv', sep = "\t")

# Note, I've changed some values so you can see something actually happens
keeplist = ['BOUND','GCOUL','CHIEF']
HydroCap = {'BOUND':5500,'GCOUL':280000,'CHIEF':21900}

# The inside of the loop has been changed to accomplish the actual goal
# First, there are now two variables inside the loop: col, and c
# col is the column
# c represents a single element in that column at a time

# The code operates over a column at a time,
# using a list comprehension to cycle over each element
# and replace the full column with the new values at once
for col in df[keeplist]:
    df[col] = [np.where(c >= HydroCap[col], HydroCap[col], c) for c in df[col]]

Which produces:

df

	Week	Month	Day	Year	BOUND	GCOUL	CHIEF
0	1	8	5	1979	5500.0	161241.036388	21900.0
1	2	8	12	1979	5500.0	58219.122747	21900.0
2	3	8	19	1979	5500.0	56072.024363	21900.0
3	4	8	26	1979	5500.0	58574.003087	21900.0

In order to replace elements in a dataframe, you either need to go a whole column at a time, or reassign values to a cell specified by row and column coordinates. Reassigning the c variable in your original code—assuming it represented the cell values you had in mind, and not the column name as was the case—doesn't alter anything in the dataframe.