Home > OS >  How to set the last column of a pandas dataframe as the sum of certain columns?
How to set the last column of a pandas dataframe as the sum of certain columns?

Time:10-17

I was reading this thread here

For reasons I don't understand, pandas will not read the value of the formula in the last column of my Excel.

Column W in the Excel is just the sum of columns D to V for each row.

I need to save this Excel as a .csv to import into MySQL. So I use pandas.

print('Now converting to csv .... ')
newname = target.split('.')
savename = newname[0]   '.csv'
#read_file = pd.read_excel (pathTarget   target)
df = pd.read_excel (pathTarget   target, sheet_name=clas)
#read_file.to_csv (pathTarget   savename, index = None, header=True)
df.to_csv(pathTarget   savename, index=False)

But I don't get anything in the last column, so I thought, modify the dataframe then save it as csv.

I can easily get the sum of the columns I want:

new_val = df.index
for x in new_val:
    print(x)
    ys = df.iloc[x]['Week1':'Week19'].sum()
    print('your total score is', ys)

I can easily set the whole column 'your_score' to 1 value, like 606, as in the thread in the link above.

df['your_score'] = ys

But I can't run a loop and assign each row sum to the last column 'your_score'

df.iloc[0]['your_score'] = ys

I get the error:

Warning (from warnings module): File "<pyshell#26>", line 1 SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

I tried various things, but in the end I just used Pandas to write the Excel as csv, then used Python's csv module to do what I want.

How can I put each row sum

ys = df.iloc[x]['Week1':'Week19'].sum()

in the last column at that iloc location in Pandas??

CodePudding user response:

One of the advantages of pandas is that you can often use vectorized operations instead of loops. Thus in your case it's possible to sum over a 2-dimensional slice of the dataframe like this:

df['your_score'] = df.loc[:, 'Week1':'Week19'].sum(axis=1)

The loc operator allows indexing and slicing by labels. : selects all the rows, so we get a sub-dataframe. The df.sum() method is based on the NumPy function of the same name, so you can select a dimension to sum over with the axis argument.

  • Related