Home > Mobile >  Create panel data in Python using loop
Create panel data in Python using loop

Time:04-14

I am trying to create a panel data frame in Python, e.g. for 5 countries (A, B, C, D, E) each with 3 years of data (2000, 2001, 2002).

import numpy as np
import pandas as pd

df = {'id': [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5],
      'country': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
      'year': [2000, 2001, 2002, 2000, 2001, 2002, 2000, 2001, 2002, 2000, 2001, 2002, 2000, 2001, 2002]
        }
df = pd.DataFrame(df)
df

To extend it to bigger datasets, I am trying to loop using the following codes to obtain the above result, but it is not giving me the desired data frame.

n_country = 5 # number of countries
n_year = 3 # number of years of data for each country
columns = ("id", "country", "year")
n_rows = n_country*n_year
data = pd.DataFrame(np.empty(shape = (n_rows, 3)), columns = columns)
# set country numbers which will identify each country, create country id ranging from 1 to 5
country_id = range(1, 1   n_country)
list(country_id)
# create year from 2000 to 2002
year = range(2000, 2000   n_year)
list(year)
# create dictionary that maps from country id to country name
country_name = dict(zip(country_id, ['A', 'B', 'C', 'D', 'E']))
country_name
# loop starts here
i = 0
for id in country_id:
    for country in ["A", "B", "C", "D", "E"]:
        for year in [2000, 2001, 2002]:
            data.loc[i, "id"] = id
            data.loc[i, "year"] = year
            data.loc[i, "country"] = country_name[id]
            i =  1

The resulting data frame is not what is intended.

I would very much appreciate it if any user could point out the mistake in the loop above.

Thank you!

CodePudding user response:

I would use product on the year/countries then use cat.codes to label the countries.

from itertools import product
import pandas as pd

start_year = 2000
end_year = 2003

countries = ['A','B','C','D','E']


df = pd.DataFrame(list(product(range(start_year,end_year 1),countries)), columns=['year','country'])
df['id'] = df.country.astype('category').cat.codes 1
print(df)

Output

 year country  id
0   2000       A   1
1   2000       B   2
2   2000       C   3
3   2000       D   4
4   2000       E   5
5   2001       A   1
6   2001       B   2
7   2001       C   3
8   2001       D   4
9   2001       E   5
10  2002       A   1
11  2002       B   2
12  2002       C   3
13  2002       D   4
14  2002       E   5
15  2003       A   1
16  2003       B   2
17  2003       C   3
18  2003       D   4
19  2003       E   5

As for your current loop, you may want to zip id and country, so that those are reused for each of the year loop, and it needs to be i =1 not i= 1

n_country = 5 # number of countries
n_year = 3 # number of years of data for each country
columns = ("id", "country", "year")
n_rows = n_country*n_year
data = pd.DataFrame(np.empty(shape = (n_rows, 3)), columns = columns)
# set country numbers which will identify each country, create country id ranging from 1 to 5
country_id = range(1, 1   n_country)
list(country_id)
# create year from 2000 to 2002
year = range(2000, 2000   n_year)
list(year)
# create dictionary that maps from country id to country name
country_name = dict(zip(country_id, ['A', 'B', 'C', 'D', 'E']))
country_name
# loop starts here
i = 0
for c_id,country in zip(country_id,["A", "B", "C", "D", "E"]):
    print(c_id, country)
    for year in [2000, 2001, 2002]:
        data.loc[i, "id"] = c_id
        data.loc[i, "year"] = year
        data.loc[i, "country"] = country
        i  =1
  • Related