Home > Enterprise >  Convert Spreadsheet to Data Frame Syntax in Python
Convert Spreadsheet to Data Frame Syntax in Python

Time:06-07

Out of the box (and broad) question here: I have a spreadsheet that I'd like to use as an example in a question to the SO community, but I have a feeling most SO helpers would not like to deal with spreadsheets as data input/examples.

Instead, I'd like to provide an already-transformed data frame for you; however, I am not sure how to do this.

I'd like to generate code that will take spreadsheet columns and turn them into a single data frame syntax code that I can easily share here.

Example of how the data frame syntax would be:

# Example of data frame 'syntax code'

import pandas as pd
df = {'A' : ['B','C','D'], '1': [2,3,4]}
df = pd.DataFrame(df)

etc...

In essence, every column of the spreadsheet would be it's own respective column in the said pandas data frame. Any blanks would correspond to np.nan.

The overall idea here is to import two spreadsheets as data frames and make comparisons between them for differences.

Simply put, how can I turn spreadsheet values into something like {'A' : ['B','C','D'], '1': [2,3,4]} with code?

CodePudding user response:

Nice question, and thank you for caring about the community in this way! You're right that it makes it a lot easier to help with problems with we have to do as little work to get set up as possible. :)

Unless your spreadsheet has complex formatting/coloring, or complex columns or that sort of stuff, it should be as simple to convert your spreadsheet to a portable datastructure for pd.DataFrame as

pd.read_excel('path/to/file.xlsx').head(5).to_dict()

Note the .head(5) - that selects only the first 5 rows, because if your spreadsheet is large, having many rows will likely make everything harder if you don't need to give us those extra rows.

  • Related