I want to create a new column named total which adds all the year columns (everything in these columns are integers). I want to do it dynamically because as each year passes there will be a new column (for example 2024).
How can I do this in Python using method chaining and the assign operator?
id name 2018 2019 2020 2021 2022 type
1 John 0 1 0 0 2 A
2 Bill 1 5 4 0 0 B
3 Tom 0 0 2 0 5 B
4 Mary 0 1 1 0 0 A
Expected Output:
id name 2018 2019 2020 2021 2022 type total
1 John 0 1 0 0 2 A 3
2 Bill 1 5 4 0 0 B 10
3 Tom 0 0 2 0 5 B 7
4 Mary 0 1 1 0 0 A 2
I have this solution but I don't like it, is there a more eloquent way of writing this code?
Temporary Solution:
(
df
.assign(Total = lambda x: x['2018'] x['2019'] x['2020'] x['2021'] x['2022'])
)
CodePudding user response:
Assuming that the only columns with int values will be the year columns, you can just do df['Total'] = df.sum(axis = 1)
:
columns = ['name', '2018', '2019', '2020', '2021', '2022', 'type']
values = [['John',0,1,0,0,2,'A'], ['Bill',1,5,4,0,0,'B'], ['Tom',0,0,2,0,5,'B'], ['Mary',0,1,1,0,0,'A']]
df = pd.DataFrame(values, columns = columns)
df['Total'] = df.sum(axis = 1)
df
Output:
name 2018 2019 2020 2021 2022 type Total
0 John 0 1 0 0 2 A 3
1 Bill 1 5 4 0 0 B 10
2 Tom 0 0 2 0 5 B 7
3 Mary 0 1 1 0 0 A 2
CodePudding user response:
Use DataFrame.loc
for select columns between 2018
and 2022
:
df = df.assign(Total = df.loc[:, '2018':'2022'].sum(axis=1))
print (df)
id name 2018 2019 2020 2021 2022 type Total
0 1 John 0 1 0 0 2 A 3
1 2 Bill 1 5 4 0 0 B 10
2 3 Tom 0 0 2 0 5 B 7
3 4 Mary 0 1 1 0 0 A 2
Another idea is filter columns names by mask - here convert to numeric and test if column names are larger like 2000
:
df = df.assign(Total=df.loc[:, pd.to_numeric(df.columns,errors='coerce')>2000].sum(axis=1))
print (df)
id name 2018 2019 2020 2021 2022 type Total
0 1 John 0 1 0 0 2 A 3
1 2 Bill 1 5 4 0 0 B 10
2 3 Tom 0 0 2 0 5 B 7
3 4 Mary 0 1 1 0 0 A 2
CodePudding user response:
You could use assign
with filter
with a generic regex for 4-digits year:
out = df.assign(Total=lambda d: d.filter(regex='^\d{4}$').sum(axis=1))
Or with drop
and the columns to ignore:
out = df.assign(Total=lambda d: d.drop(columns=['id', 'name', 'type', 'Total'],
errors='ignore').sum(axis=1))
Output:
id name 2018 2019 2020 2021 2022 type Total
0 1 John 0 1 0 0 2 A 3
1 2 Bill 1 5 4 0 0 B 10
2 3 Tom 0 0 2 0 5 B 7
3 4 Mary 0 1 1 0 0 A 2
Sorting columns and changing type of all integer columns:
out = (df.assign(Total=lambda d: d.filter(regex='^\d{4}$').sum(axis=1)
.sort_index(key=lambda x: x.astype(str).str.fullmatch('\d{4}'), axis=1)
.pipe(lambda d: d.astype({c: float for c in out.select_dtypes('number')}))
)
Output:
id name type total 2018 2019 2020 2021 2022
0 1.0 John A 3.0 0.0 1.0 0.0 0.0 2.0
1 2.0 Bill B 10.0 1.0 5.0 4.0 0.0 0.0
2 3.0 Tom B 7.0 0.0 0.0 2.0 0.0 5.0
3 4.0 Mary A 2.0 0.0 1.0 1.0 0.0 0.0