I have a dataframe that I would like to split into multiple dataframes using the value in my Date column. Ideally, I would like to split my dataframe by decades. Do I need to use np.array_split method or is there a method that does not require numpy?
My Dataframe looks like a larger version of this:
Date Name
0 1746-06-02 Borcke (#p1)
1 1746-09-02 Jordan (#p31)
2 1747-06-02 Sa Majesté (#p32)
3 1752-01-26 Maupertuis (#p4)
4 1755-06-02 Jordan (#p31)
And so I would ideally want in this scenario two data frames like these:
Date Name
0 1746-06-02 Borcke (#p1)
1 1746-09-02 Jordan (#p31)
2 1747-06-02 Sa Majesté (#p32)
Date Name
0 1752-01-26 Maupertuis (#p4)
1 1755-06-02 Jordan (#p31)
CodePudding user response:
Building up on mozways answer for getting the decades.
d = {
"Date": [
"1746-06-02",
"1746-09-02",
"1747-06-02",
"1752-01-26",
"1755-06-02",
],
"Name": [
"Borcke (#p1)",
"Jordan (#p31)",
"Sa Majesté (#p32)",
"Maupertuis (#p4)",
"Jord (#p31)",
],
}
import pandas as pd
import math
df = pd.DataFrame(d)
df["years"] = df['Date'].str.extract(r'(^\d{4})', expand=False).astype(int)
df["decades"] = (df["years"] / 10).apply(math.floor) *10
dfs = [g for _,g in df.groupby(df['decades'])]
CodePudding user response:
Use groupby
, you can generate a list of DataFrames:
dfs = [g for _,g in df.groupby(df['Date'].str.extract(r'(^\d{3})', expand=False)]
Or, validating the dates:
dfs = [g for _,g in df.groupby(pd.to_datetime(df['Date']).dt.year//10)]
If you prefer a dictionary for indexing by decade:
dfs = dict(list(df.groupby(pd.to_datetime(df['Date']).dt.year//10*10)))
NB. I initially missed that you wanted decades, not years. I updated the answer. The logic remains unchanged.