Home > database >  Merge value_counts of different pandas dataframes
Merge value_counts of different pandas dataframes

Time:10-13

I have a list of pandas dataframes in which i do the value_counts of a column and finally append all the results to another dataframe.

df_AB = pd.read_pickle('df_AB.pkl')
df_AC = pd.read_pickle('df_AC.pkl')
df_AD = pd.read_pickle('df_AD.pkl')
df_AE = pd.read_pickle('df_AE.pkl')
df_AF = pd.read_pickle('df_AF.pkl')
df_AG = pd.read_pickle('df_AG.pkl')

The format of the above dataframes is as below (Example: df_AB):

df_AB:
id   is_valid
121  True
122  False
123  True

For every pandas dataframe, I would need to get the value_counts of is_valid column and store the results to df_result. I tried the below code but doesn't seem to work as expected.

df_AB_VC = df_AB['is_valid'].value_counts() 
df_AB_VC['group'] = "AB"
df_AC_VC = df_AC['is_valid'].value_counts()
df_AC_VC['group'] = "AC"

Result dataframe (df_result):

Group   is_valid_True_Count    is_Valid_False_Count
AB        2                      1
AC   
AD
 .
 .
 .

Any leads would be appreciated

CodePudding user response:

I think you just need to work on the dataframes a bit more systematically:

groups = ['AB', 'AC', 'AD',...]

out = pd.DataFrame({
    g: pd.read_pickle(f'df_{g}.pkl')['is_valid'].value_counts()
    for g in groups
}).T

CodePudding user response:

Do not use variables, that makes your code much more complicated. Use a container

files = ['df_AB.pkl', 'df_AC.pkl', 'df_AD.pkl', 'df_AE.pkl', 'df_AF.pkl']

# using the XX part in "df_XX.pkl", you need to adapt to your real use-case
dataframes = {f[3:5]: pd.read_pickle(f) for f in files}

# compute counts
counts = (pd.DataFrame({k: d['is_valid'].value_counts()
                        for k,d in dataframes.items()})
            .T.add_prefix('is_valid_').add_suffix('_Count')
          )

example output:

    is_valid_True_Count  is_valid_False_Count
AB                    2                     1
AC                    2                     1

CodePudding user response:

Use pathlib to extract group name then collect data into dictionary before concatenate all entries:

import pandas as pd
import pathlib

data = {}
for pkl in pathlib.Path().glob('df_*.pkl'):
    group = pkl.stem.split('_')[1]
    df = pd.read_pickle(pkl)
    data[group] = df['is_valid'].value_counts() \
                                .add_prefix('is_valid_') \
                                .add_suffix('_Count')
df = pd.concat(data, axis=1).T
>>> df
    is_valid_True_Count  is_valid_False_Count
AD                    2                     1
AB                    4                     2
AC                    0                     3
  • Related