Home > database >  Python pandas wide data - finding earliest value across time series fields
Python pandas wide data - finding earliest value across time series fields

Time:01-28

I am working with a data frame that is written in wide format. Each book has a number of sales, but some quarters have null values because the book was not released before that quarter.

enter image description here

import pandas as pd

data = {'Book Title': ['A Court of Thorns and Roses', 'Where the Crawdads Sing', 'Bad Blood', 'Atomic Habits'],
    'Metric': ['Book Sales','Book Sales','Book Sales','Book Sales'],
   'Q1 2022': [100000,0,0,0],
   'Q2 2022': [50000,75000,0,35000],
   'Q3 2022': [25000,150000,20000,45000],
   'Q4 2022': [25000,20000,10000,65000]}

df1 = pd.DataFrame(data)

What I want to do is find the first two available quarters for each title, and create a new dataframe with those values in two columns, like so:

enter image description here

I've had a hard time finding what function would do this - can you point me in the right direction? Thank you!

CodePudding user response:

Some version of melt or stack would probably the the easiest way to go.

import pandas as pd
import numpy as np
data = {'Book Title': ['A Court of Thorns and Roses', 'Where the Crawdads Sing', 'Bad Blood', 'Atomic Habits'],
    'Metric': ['Book Sales','Book Sales','Book Sales','Book Sales'],
   'Q1 2022': [100000,np.nan,np.nan,np.nan],
   'Q2 2022': [50000,75000,np.nan,35000],
   'Q3 2022': [25000,150000,20000,45000],
   'Q4 2022': [25000,20000,10000,65000]}

key_cols = ['Book Title','Metric']
new_cols = ['First Quarter','Second Quarter']
df1[new_cols]  = (df1.set_index(key_cols)
                     .stack()
                     .groupby(level=0)
                     .head(2)
                     .values
                     .astype(int)
                     .reshape(len(df1),-1)
                 )

df1 = df1[key_cols new_cols]

Output

                    Book Title      Metric  First Quarter  Second Quarter
0  A Court of Thorns and Roses  Book Sales         100000           50000
1      Where the Crawdads Sing  Book Sales          75000          150000
2                    Bad Blood  Book Sales          20000           10000
3                Atomic Habits  Book Sales          35000           45000
  • Related