Home > Blockchain >  Creating a DataFrame with same value in col A and different value in col B
Creating a DataFrame with same value in col A and different value in col B

Time:08-03

I'm pretty new to Python and I would like to create a DataFrame from scratch that will look basically like this:

| date          | time_gap       |
| ------------- | -------------- |
| 2018/12/05    | 0              |
| 2018/12/05    | 1              |
| 2018/12/05    | ...            |
| 2018/12/05    | 97             |
| 2018/12/06    | 0              |
| 2018/12/06    | ...            |
| 2018/12/06    | 97             |
| 2018/12/07    | 0              |
| 2018/12/07    | ...            |
| 2018/12/07    | 97             |
...

So my first column would have 98 times the same date, while my second column goes from 0 to 97. I've tried this:

import pandas as pd
import datetime as dt

start_date = dt.datetime.strptime("2018/12/05", "%Y/%m/%d")
end_date = dt.datetime.strptime("2018/12/08", "%Y/%m/%d")
df_dates = pd.date_range(start_date, end_date).to_list()

dict_generated = {}
for date in df_dates:
    for n in range(0, 98):
        dict_generated["date"] = date
        dict_generated["time_gap"] = [x for x in range(0,98)]

df_from_dict = pd.DataFrame.from_dict(dict_generated)

But it only generated it for the last date of my list of dates. Besides, I've been told a nested for loop could be pretty slow to generated (should the list of dates be quite long). Is there a quicker and more pythonic way to do?

Thank you for your help!

CodePudding user response:

Try using date_range to create the dates and np.arange to create the values

import numpy as np
import pandas as pd

df = pd.DataFrame({'date': pd.date_range("2018/12/05", "2018/12/08").repeat(98),
                   'time_gap': np.resize(np.arange(0, 98), 98*4)})
# the date range extends four days so we do 98*4 in the resize

          date  time_gap
0   2018-12-05         0
1   2018-12-05         1
2   2018-12-05         2
3   2018-12-05         3
4   2018-12-05         4
..         ...       ...
387 2018-12-08        93
388 2018-12-08        94
389 2018-12-08        95
390 2018-12-08        96
391 2018-12-08        97

Here is an updated solution by creating a function

def create_df(start: str, end: str, repeat: int) -> pd.DataFrame: 
    dates = pd.date_range(start, end) # create the date range
    df = pd.DataFrame({'date': dates.repeat(repeat), # repeat the date range by the number specified in the param repeat
                   'time_gap': np.resize(np.arange(0, repeat), repeat*len(dates))}) # resize the array
    return df

create_df('2022-01-01', '2022-01-06', 123)

    date    time_gap
0   2022-01-01  0
1   2022-01-01  1
2   2022-01-01  2
3   2022-01-01  3
4   2022-01-01  4
... ... ...
733 2022-01-06  118
734 2022-01-06  119
735 2022-01-06  120
736 2022-01-06  121
737 2022-01-06  122

We use resize because when we do np.arange it creates an array 0 to n. When we resize it creates an array 0 to n that is repeated x times.

CodePudding user response:

Try this:

import datetime as dt

start_date = dt.datetime.strptime("2018/12/05", "%Y/%m/%d")
end_date = dt.datetime.strptime("2018/12/08", "%Y/%m/%d")
df_dates = pd.DataFrame(pd.date_range(start_date, end_date))
one_to_hundred = pd.DataFrame(pd.Series(range(0,98)))
df = pd.merge(df_dates,one_to_hundred,how='cross')

print(df)

CodePudding user response:

This solution is like It_is_Chris's but without using Numpy

start_date = dt.datetime.strptime("2018/12/05", "%Y/%m/%d")
dates = pd.DataFrame({'date': pd.date_range(start_date, periods = 4).strftime("%Y/%m/%d").repeat(98),
                     'time_gap':(list(range(0,98))*4)})
  • Related