I need to add a column to calculate the total amount of immigrants in Canada by AreaName
in df_canada
:
df_canada = pd.read_csv('https://raw.githubusercontent.com/iikotelnikov/datasets/main/canada_immigration.csv', sep=';')
df_canada
Firsty, I added an additional row to calculate the total amount of immigrants in Canada by year.
# Here we add cell for sum of immigrants in Canada by year
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as ani
import datetime as dt
%matplotlib inline
df_canada.loc[197] = {'Type': 'Sum of immigrants in Canada by year'}
df_canada.loc[197, 10:] = df_canada[df_canada['Type'] != 'Sum of immigrants in Canada by year'].iloc[:, 10:].sum()
df_canada
Secondly, I need to calculate the total number of immigrants in Canada by AreaName
.
# Here we add cell for sum of immigrants in Canada by Area
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as ani
import datetime as dt
%matplotlib inline
df_canada.loc[198] = {'Type': 'Sum of immigrants in Canada by Area'}
df_canada.loc[198, 10:] = df_canada[df_canada['Type'] != 'Sum of immigrants in Canada by year'].iloc[:, 10:].sum()
But It is not suitable for me.
I have no idea what my next step is.
Could you advise me on how to calculate the total number of immigrants in Canada by area and create the column with this amount?
CodePudding user response:
You need to do this !
- Import the data in the dataframe
- As some of the values are rows rather in single column, you need to use melt to convert rows into column
- Apply groupby with area name and year and sum for each row
- Load the output to file.
Code begins here ->
df_canada = pd.read_csv('https://raw.githubusercontent.com/iikotelnikov/datasets/main/canada_immigration.csv', sep=';')
agg_df_candaa = df_canada.melt(id_vars=['Type', 'Coverage', 'OdName', 'AREA', 'AreaName', 'REG', 'RegName',
'DEV', 'DevName'], var_name='year', value_name="value")
result_df = agg_df_candaa.groupby(['AreaName','year'])['value'].sum()
result_df.to_csv('Your_location/result_df.csv')