Home > database >  Creating unique dataframes based on one with adding a value to the column
Creating unique dataframes based on one with adding a value to the column

Time:10-31

I have a dataset with statistics by region. I would like to build several other city datasets based on this dataset. At the same time, when creating in each such set, I would like to add a column with the name of the city.

That is, from one data set, I would like to receive three.

I'll give you an example. Initial dataset:

df
date         name_region    
2022-01-01   California
2022-01-02   California
2022-01-03   California

Next, I have a list with cities: city_list = ['Los Angeles', 'San Diego', 'San Jose']

As an output, I want to have 3 datasets (or more, depending on the number of items in the list):

df_city_1
date         name_region    city    
2022-01-01   California     Los Angeles
2022-01-02   California     Los Angeles
2022-01-03   California     Los Angeles
df_city_2
date         name_region    city    
2022-01-01   California     San Diego
2022-01-02   California     San Diego
2022-01-03   California     San Diego

df_city_3
date         name_region    city    
2022-01-01   California     San Jose
2022-01-02   California     San Jose
2022-01-03   California     San Jose

It would be ideal if, at the same time, the data set could be accessed by a key determined by an element in the list:

df_city['Los Angeles']
date         name_region    city    
2022-01-01   California     Los Angeles
2022-01-02   California     Los Angeles
2022-01-02   California     Los Angeles

How can I do that? I found only a way of this division into several data sets, when the original set already has information on the unique values of the column (in this case, the same cities), , but this does not suit me very well.

CodePudding user response:

You can use a dictionary comprehension, and add the column city each time using df.assign.

import pandas as pd

data = {'date': {0: '2022-01-01', 1: '2022-01-02', 2: '2022-01-02'}, 
        'name_region': {0: 'California', 1: 'California', 2: 'California'}}
df = pd.DataFrame(data)

city_list = ['Los Angeles', 'San Diego', 'San Jose']

# "df_city" as a `dict`
df_city = {city: df.assign(city=city) for city in city_list}

# accessing each `df` by key (i.e. a `list` element)
print(df_city['Los Angeles'])

         date name_region         city
0  2022-01-01  California  Los Angeles
1  2022-01-02  California  Los Angeles
2  2022-01-02  California  Los Angeles

CodePudding user response:

Another possible solution:

dfs = [] 
for city in city_list:
    dfs.append(df.assign(city = city))

cities = dict(zip(city_list, dfs))
cities['Los Angeles']

Output:

         date name_region         city
0  2022-01-01  California  Los Angeles
1  2022-01-02  California  Los Angeles
2  2022-01-02  California  Los Angeles
  • Related