How to set the last column of a pandas dataframe as the sum of certain columns?-CodePudding

For reasons I don't understand, pandas will not read the value of the formula in the last column of my Excel.

Column W in the Excel is just the sum of columns D to V for each row.

I need to save this Excel as a .csv to import into MySQL. So I use pandas.

print('Now converting to csv .... ')
newname = target.split('.')
savename = newname[0]   '.csv'
#read_file = pd.read_excel (pathTarget   target)
df = pd.read_excel (pathTarget   target, sheet_name=clas)
#read_file.to_csv (pathTarget   savename, index = None, header=True)
df.to_csv(pathTarget   savename, index=False)

But I don't get anything in the last column, so I thought, modify the dataframe then save it as csv.

I can easily get the sum of the columns I want:

new_val = df.index
for x in new_val:
    print(x)
    ys = df.iloc[x]['Week1':'Week19'].sum()
    print('your total score is', ys)

I can easily set the whole column 'your_score' to 1 value, like 606, as in the thread in the link above.

df['your_score'] = ys

But I can't run a loop and assign each row sum to the last column 'your_score'

df.iloc[0]['your_score'] = ys

I get the error:

Warning (from warnings module): File "<pyshell#26>", line 1 SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

I tried various things, but in the end I just used Pandas to write the Excel as csv, then used Python's csv module to do what I want.

How can I put each row sum

ys = df.iloc[x]['Week1':'Week19'].sum()

in the last column at that iloc location in Pandas??

CodePudding user response：

One of the advantages of pandas is that you can often use vectorized operations instead of loops. Thus in your case it's possible to sum over a 2-dimensional slice of the dataframe like this:

df['your_score'] = df.loc[:, 'Week1':'Week19'].sum(axis=1)

The loc operator allows indexing and slicing by labels. : selects all the rows, so we get a sub-dataframe. The df.sum() method is based on the NumPy function of the same name, so you can select a dimension to sum over with the axis argument.