Home > Software engineering >  Split dataframe by date column
Split dataframe by date column

Time:09-18

I have a dataframe that I would like to split into multiple dataframes using the value in my Date column. Ideally, I would like to split my dataframe by decades. Do I need to use np.array_split method or is there a method that does not require numpy?

My Dataframe looks like a larger version of this:

    Date    Name
0   1746-06-02  Borcke (#p1)
1   1746-09-02  Jordan (#p31)
2   1747-06-02  Sa Majesté (#p32)
3   1752-01-26  Maupertuis (#p4)
4   1755-06-02  Jordan (#p31)

And so I would ideally want in this scenario two data frames like these:

    Date    Name
0   1746-06-02  Borcke (#p1)
1   1746-09-02  Jordan (#p31)
2   1747-06-02  Sa Majesté (#p32)

    Date        Name
0   1752-01-26  Maupertuis (#p4)
1   1755-06-02  Jordan (#p31)

CodePudding user response:

Building up on mozways answer for getting the decades.

d = {
    "Date": [
        "1746-06-02",
        "1746-09-02",
        "1747-06-02",
        "1752-01-26",
        "1755-06-02",
    ],
    "Name": [
        "Borcke (#p1)",
        "Jordan (#p31)",
        "Sa Majesté (#p32)",
        "Maupertuis (#p4)",
        "Jord (#p31)",
    ],
}
import pandas as pd
import math
df = pd.DataFrame(d)

df["years"] = df['Date'].str.extract(r'(^\d{4})', expand=False).astype(int)
df["decades"]  = (df["years"] / 10).apply(math.floor) *10
dfs = [g for _,g in df.groupby(df['decades'])] 

CodePudding user response:

Use groupby, you can generate a list of DataFrames:

dfs = [g for _,g in df.groupby(df['Date'].str.extract(r'(^\d{3})', expand=False)] 

Or, validating the dates:

dfs = [g for _,g in df.groupby(pd.to_datetime(df['Date']).dt.year//10)] 

If you prefer a dictionary for indexing by decade:

dfs = dict(list(df.groupby(pd.to_datetime(df['Date']).dt.year//10*10)))

NB. I initially missed that you wanted decades, not years. I updated the answer. The logic remains unchanged.

  • Related