I have a dataset with statistics by region. I would like to build several other city datasets based on this dataset. At the same time, when creating in each such set, I would like to add a column with the name of the city.
That is, from one data set, I would like to receive three.
I'll give you an example. Initial dataset:
df
date name_region
2022-01-01 California
2022-01-02 California
2022-01-03 California
Next, I have a list with cities: city_list = ['Los Angeles', 'San Diego', 'San Jose']
As an output, I want to have 3 datasets (or more, depending on the number of items in the list):
df_city_1
date name_region city
2022-01-01 California Los Angeles
2022-01-02 California Los Angeles
2022-01-03 California Los Angeles
df_city_2
date name_region city
2022-01-01 California San Diego
2022-01-02 California San Diego
2022-01-03 California San Diego
df_city_3
date name_region city
2022-01-01 California San Jose
2022-01-02 California San Jose
2022-01-03 California San Jose
It would be ideal if, at the same time, the data set could be accessed by a key determined by an element in the list:
df_city['Los Angeles']
date name_region city
2022-01-01 California Los Angeles
2022-01-02 California Los Angeles
2022-01-02 California Los Angeles
How can I do that? I found only a way of this division into several data sets, when the original set already has information on the unique values of the column (in this case, the same cities), , but this does not suit me very well.
CodePudding user response:
You can use a dictionary comprehension
, and add the column city
each time using df.assign
.
import pandas as pd
data = {'date': {0: '2022-01-01', 1: '2022-01-02', 2: '2022-01-02'},
'name_region': {0: 'California', 1: 'California', 2: 'California'}}
df = pd.DataFrame(data)
city_list = ['Los Angeles', 'San Diego', 'San Jose']
# "df_city" as a `dict`
df_city = {city: df.assign(city=city) for city in city_list}
# accessing each `df` by key (i.e. a `list` element)
print(df_city['Los Angeles'])
date name_region city
0 2022-01-01 California Los Angeles
1 2022-01-02 California Los Angeles
2 2022-01-02 California Los Angeles
CodePudding user response:
Another possible solution:
dfs = []
for city in city_list:
dfs.append(df.assign(city = city))
cities = dict(zip(city_list, dfs))
cities['Los Angeles']
Output:
date name_region city
0 2022-01-01 California Los Angeles
1 2022-01-02 California Los Angeles
2 2022-01-02 California Los Angeles