Home > Blockchain >  Using pandas to extract all unique values across all columns in excel file
Using pandas to extract all unique values across all columns in excel file

Time:05-18

I am fairly new to pandas/python. I’m trying to ingest an excel file and output an excel file that contains all unique values in each column.

I am easily able to print the values using:

for col in df:
    print(df[col].unique())

My question is: how can I get these results to be sent to an excel file and not just print them. Thank you in advance.

CodePudding user response:

Use dictionary comprehension to create your new unique frame and then create a new file. Dictionaries can easily be used to create a DataFrame as long as the arrays are the same length. To make sure the arrays are the same length when using unique we just convert the arrays to a pd.Series

You do not have an expected output so I assume you just want all the unique values in one file with nans for missing values.

pd.DataFrame({col: pd.Series(df[col].unique()) 
              for col in df.columns}).to_excel('some_file_name.xlsx', index=False)

Here is a working example

# sample df
import pandas as pd
df = pd.DataFrame([[1,2,3,1,2],
                   list('abccd'),
                   [1]*5]).T

   0  1  2
0  1  a  1
1  2  b  1
2  3  c  1
3  1  c  1
4  2  d  1

output

pd.DataFrame({col: pd.Series(df[col].unique()) 
              for col in df.columns})

     0  1    2
0    1  a    1
1    2  b  NaN
2    3  c  NaN
3  NaN  d  NaN
  • Related