Create pandas DataFrame with iterative values-CodePudding

I want to create a pandas DataFrame with five columns: date, house type, neighborhood, realtor and zip code.

A date range of August 1, 2023 to September 1, 2023
There are 4 house types
6 realtors
Only one neighborhood
Only one zipcode So each the date needs to be repeated 16 times for each house type, in this only neighborhood, for each realtor and for one unique zip code. So there would be 16 rows per day.

I can add the consecutive dates to the date column, however, I have trouble in adding the rest of the data, which I think should be nested in for loops. So far, I have these lines:

import pandas as pd
from datetime import date,timedelta
delta = timedelta(days=1)
start = date(2023,8,1)
end=date(2023,9,1)
loop_date = start

df = pd.DataFrame(columns={"Date", "House type", "Neighborhood", "Realtor", "Zipcode"})
while loop_date<=end:
    df = df.append({"Date":loop_date},ignore_index=True)
    loop_date =delta

The intended output for each day would be the following:

Date	House type	Neighborhood	Realtor	Zipcode
8/1/23	1	a	Alex	10018
8/1/23	2	a	Alex	10018
8/1/23	3	a	Alex	10018
8/1/23	4	a	Alex	10018
8/1/23	1	a	Maggie	10018
8/1/23	2	a	Maggie	10018
8/1/23	3	a	Maggie	10018
8/1/23	4	a	Maggie	10018
8/1/23	1	a	Ginger	10018
8/1/23	2	a	Ginger	10018
8/1/23	3	a	Ginger	10018
8/1/23	4	a	Ginger	10018
8/1/23	1	a	Pat	10018
8/1/23	2	a	Pat	10018
8/1/23	3	a	Pat	10018
8/1/23	4	a	Pat	10018

Any help would be much appreciated, thank you!

CodePudding user response：

It sounds like a Cartesian product. I think product in itertools can help:

from itertools import product
list(product(['a', 'b', 'c'], ['Alex', 'Maggie', 'Ginger', 'Pat'], range(3)))

[('a', 'Alex', 0),
 ('a', 'Alex', 1),
 ('a', 'Alex', 2),
 ('a', 'Maggie', 0),
 ('a', 'Maggie', 1),
 ('a', 'Maggie', 2),
 ('a', 'Ginger', 0),
 ('a', 'Ginger', 1),
 ('a', 'Ginger', 2),
 ('a', 'Pat', 0),
 ('a', 'Pat', 1),
 ('a', 'Pat', 2),
 ('b', 'Alex', 0),
 ('b', 'Alex', 1),
 ('b', 'Alex', 2),
 ('b', 'Maggie', 0),
...
]

CodePudding user response：

You may create your dataframe using two nested loops:

First while loop iterates over the dates (you already got that one covered)
Second for loop iterates over a list of realtors

You can simply create temporary dataframes df_temp containing the date, realtor and house types (read from a numpy array), append them to a list and finally concatenate this list of dataframes into your output dataframe df:

import numpy as np
import pandas as pd
from datetime import date,timedelta
delta = timedelta(days=1)
start = date(2023,8,1)
end=date(2023,9,1)
loop_date = start
realtors = ['Alex', 'Maggie', 'Ginger', 'Pat']

list_of_dataframes = []

while loop_date<=end:
    for realtor in realtors:
        df_temp = pd.DataFrame(columns={"Date", "House type", "Realtor"})
        df_temp['House type'] = pd.Series(np.arange(1,5))
        df_temp['Date'] = loop_date
        df_temp['Realtor'] = realtor
        list_of_dataframes.append(df_temp)
        loop_date =delta
df = pd.concat(list_of_dataframes)

Once this is done, the only thing left to do is to populate columns Neighborhood and Zipcode!

CodePudding user response：

try this:

import pandas as pd
from itertools import product


house_types = [1,2,3,4]
realtors = [*'ABCDEF']
neighborhood = ['a']
zip_code = ['10018']
daily_data = product(house_types, realtors, neighborhood, zip_code)
dates = pd.date_range('2023-8-1', '2023-9-1')
data = product(dates, daily_data)
idx, data = zip(*data)
cols = ["House_type", "Realtor", "Neighborhood", "Zipcode"]
df = pd.DataFrame(data, index=idx, columns=cols).rename_axis('Date')
print(df)
>>>
           House_type   Realtor    Neighborhood  Zipcode
Date                
2023-08-01  1           A               a        10018
2023-08-01  1           B               a        10018
2023-08-01  1           C               a        10018
2023-08-01  1           D               a        10018
2023-08-01  1           E               a        10018
   ...     ...         ...             ...        ...
2023-09-01  4           B               a        10018
2023-09-01  4           C               a        10018
2023-09-01  4           D               a        10018
2023-09-01  4           E               a        10018
2023-09-01  4           F               a        10018