I wish to create new dataframes, with the logic in which they are split is based on year.
Data
df
id type Q1 22 Q2 22 Q3 22 Q1 23 Q2 23 Q3 23
aa hi 0.2 0.8 0.3 1.1 2.1 0.4
aa ok 0.2 0.7 0.3 0.5 1.0 1.7
aa hello 2.0 0.1 0.0 0.1 0.1 0.1
Desired
df1
id type Q1 22 Q2 22 Q3 22
aa hi 0.2 0.8 0.3
aa ok 0.2 0.7 0.3
aa hello 2.0 0.1 0.0
df2
id type Q1 23 Q2 23 Q3 23
aa hi 1.1 2.1 0.4
aa ok 0.5 1.0 1.7
aa hello 0.1 0.1 0.1
Doing
# sort the dataframe
df.sort_values(by='year', axis=1, inplace=True)
# set the index to be this and don't drop
df.set_index(keys=['year], drop=False,inplace=True)
# get a list of names
new=df['year'].unique().tolist()
#perform a lookup on a 'view' of the dataframe
new2023 = df.loc[df.name=='2023']
I am still researching, any suggestion is helpful.
CodePudding user response:
Hard way to do it.
from itertools import groupby
import pandas as pd
df =pd.read_csv(r"Book1.csv")
lst = [*df]
lst.remove('ty')
lst.remove('id')
print(lst)
splits=([list(v) for _, v in groupby(lst, key=lambda x: x[x.find(' ') 2 ])])
print(splits)
for n, val in enumerate(splits):
globals()["df%d"%n] = val
print("df%d"%n)
#val.remove('ty')
val.insert(0, "ty")
val.insert(1, "id")
ns='df' str(val)
ns=df[val].copy()
print(ns)
output#
df0
ty id Q1 22 Q2 22 Q3 22
0 hi aa 0.2 0.8 0.9
1 ok aa 0.2 0.8 0.9
2 Fuck aa 0.2 0.8 0.9
3 shit aa 0.2 0.8 0.9
df1
ty id Q1 23 Q2 23 Q3 23
0 hi aa 0.3 0.2 0.6
1 ok aa 0.3 0.2 0.6
2 Fuck aa 0.3 0.2 0.6
3 shit aa 0.3 0.2 0.6
CodePudding user response:
here is one way to do it
# filter column where column name don't ends with 3, gives us year 22
df_new22=df.filter(regex='[^3]$')
df_new22
id type Q1 22 Q2 22 Q3 22
0 aa hi 0.2 0.8 0.3
1 aa ok 0.2 0.7 0.3
2 aa hello 2.0 0.1 0.0
# filter columns that don't ends in 2, gives us year 23
df_new23=df.filter(regex='[^2]$')
df_new23
id type Q1 23 Q2 23 Q3 23
0 aa hi 1.1 2.1 0.4
1 aa ok 0.5 1.0 1.7
2 aa hello 0.1 0.1 0.1