I am fairly new to pandas/python. I’m trying to ingest an excel file and output an excel file that contains all unique values in each column.
I am easily able to print the values using:
for col in df:
print(df[col].unique())
My question is: how can I get these results to be sent to an excel file and not just print them. Thank you in advance.
CodePudding user response:
Use dictionary comprehension to create your new unique frame and then create a new file. Dictionaries can easily be used to create a DataFrame as long as the arrays are the same length. To make sure the arrays are the same length when using unique
we just convert the arrays to a pd.Series
You do not have an expected output so I assume you just want all the unique values in one file with nans
for missing values.
pd.DataFrame({col: pd.Series(df[col].unique())
for col in df.columns}).to_excel('some_file_name.xlsx', index=False)
Here is a working example
# sample df
import pandas as pd
df = pd.DataFrame([[1,2,3,1,2],
list('abccd'),
[1]*5]).T
0 1 2
0 1 a 1
1 2 b 1
2 3 c 1
3 1 c 1
4 2 d 1
output
pd.DataFrame({col: pd.Series(df[col].unique())
for col in df.columns})
0 1 2
0 1 a 1
1 2 b NaN
2 3 c NaN
3 NaN d NaN