Home > Net >  create new dataframes based on original dataframes columns
create new dataframes based on original dataframes columns

Time:03-05

I have a pandas DataFrame with 10 columns, some of which are dtypes of object. df looks something like this:

 date          ID       Cat        Chan       Num
2021-04-02     abc     Chrome     Direct       8
2021-04-02     xyz     Safari     Search       5
2021-04-02     pqr     Firefox    Direct       6
2021-04-02     dcf     Chrome       NA         1
2021-04-03     abc     Safari     Search       3
2021-04-03     xyz     Safari     Search       4
2021-04-03     pqr     NA         Direct       7
2021-04-03     dcf     Chrome     Direct       5

what I need is to write a loop that first tracks only object dtypes in this df and and then create a new DataFrame for each column. These new dfs should have columns heads for the values in the original df and rows to return count of each value per date. Something like this:

first new df: df_cat:
date           Chrome       Safari       Firefox      NA
2021-04-02       2             1            1          0
2021-04-03       1             2            0          1

Second new df: df_chan:
date           Direct       Search         NA
2021-04-02       2             1            1          
2021-04-03       2             2            0

Since column "Num" in original is int dtypes, I don't not want a new df for this column.

I tried something like the below but it gives an error:

AttributeError: 'DataFrame' object has no attribute 'item':

mylist = list(df.select_dtypes(include=['object']).columns)   
for item in mylist:
    category_item = pd.get_dummies(df.item)
    category_item.head()

Could someone please help me? I am not that good with Pandas, and it sounds like a big one for me to fix.

CodePudding user response:

You could set_index with "date"; then use get_dummies groupby sum on each of the object dtype columns to get the desired outcome.

df0 = df0.set_index('date').fillna('NA')
out = {col: pd.get_dummies(df0[col]).groupby(level=0).sum() for col in df0.select_dtypes(include=[object]).columns}

Output:

ID:
             abc  dcf  pqr  xyz
date                          
2021-04-02    1    1    1    1
2021-04-03    1    1    1    1 

Cat:
             Chrome  Firefox  NA  Safari
date                                   
2021-04-02       2        1   0       1
2021-04-03       1        0   1       2 

Chan:
             Direct  NA  Search
date                          
2021-04-02       2   1       1
2021-04-03       2   0       2 

I used a dictionary here to map column names to the new DataFrames. You could store them in a list too.

  • Related