Create new dataframe by splitting on year pandas-CodePudding

I wish to create new dataframes, with the logic in which they are split is based on year.

Data

id  type    Q1 22   Q2 22   Q3 22   Q1 23   Q2 23   Q3 23    
aa  hi      0.2     0.8     0.3     1.1     2.1     0.4  
aa  ok      0.2     0.7     0.3     0.5     1.0     1.7  
aa  hello   2.0     0.1     0.0     0.1     0.1     0.1

Desired

df1

id  type    Q1 22   Q2 22   Q3 22                
aa  hi      0.2     0.8     0.3              
aa  ok      0.2     0.7     0.3              
aa  hello   2.0     0.1     0.0

df2

id  type    Q1 23   Q2 23   Q3 23               
aa  hi      1.1     2.1     0.4             
aa  ok      0.5     1.0     1.7             
aa  hello   0.1     0.1     0.1

Doing

# sort the dataframe
df.sort_values(by='year', axis=1, inplace=True)

# set the index to be this and don't drop
df.set_index(keys=['year], drop=False,inplace=True)

# get a list of names
new=df['year'].unique().tolist()

#perform a lookup on a 'view' of the dataframe
new2023 = df.loc[df.name=='2023']

I am still researching, any suggestion is helpful.

CodePudding user response：

Hard way to do it.

from itertools import groupby
import pandas as pd
df =pd.read_csv(r"Book1.csv")


lst = [*df]
lst.remove('ty')
lst.remove('id')
print(lst)


splits=([list(v) for _, v in groupby(lst, key=lambda x: x[x.find(' ') 2 ])])

print(splits)

for n, val in enumerate(splits):
    globals()["df%d"%n] = val
    print("df%d"%n)
    
    #val.remove('ty')
    val.insert(0, "ty")
    val.insert(1, "id")
    
    ns='df' str(val)
    
    ns=df[val].copy()
    
    print(ns)

output#

df0

     ty  id  Q1 22  Q2 22  Q3 22
0    hi  aa    0.2    0.8    0.9
1    ok  aa    0.2    0.8    0.9
2  Fuck  aa    0.2    0.8    0.9
3  shit  aa    0.2    0.8    0.9

df1

     ty  id  Q1 23  Q2 23   Q3 23
0    hi  aa    0.3     0.2    0.6
1    ok  aa    0.3     0.2    0.6
2  Fuck  aa    0.3     0.2    0.6
3  shit  aa    0.3     0.2    0.6

CodePudding user response：

here is one way to do it

# filter column where column name don't ends with 3, gives us year 22
df_new22=df.filter(regex='[^3]$')
df_new22

    id  type    Q1 22   Q2 22   Q3 22
0   aa  hi        0.2     0.8     0.3
1   aa  ok        0.2     0.7     0.3
2   aa  hello     2.0     0.1     0.0

# filter columns that don't ends in 2, gives us year 23
df_new23=df.filter(regex='[^2]$')
df_new23

    id  type    Q1 23   Q2 23   Q3 23
0   aa  hi        1.1     2.1     0.4
1   aa  ok        0.5     1.0     1.7
2   aa  hello     0.1     0.1     0.1