Out of the box (and broad) question here: I have a spreadsheet that I'd like to use as an example in a question to the SO community, but I have a feeling most SO helpers would not like to deal with spreadsheets as data input/examples.
Instead, I'd like to provide an already-transformed data frame for you; however, I am not sure how to do this.
I'd like to generate code that will take spreadsheet columns and turn them into a single data frame syntax code that I can easily share here.
Example of how the data frame syntax would be:
# Example of data frame 'syntax code'
import pandas as pd
df = {'A' : ['B','C','D'], '1': [2,3,4]}
df = pd.DataFrame(df)
etc...
In essence, every column of the spreadsheet would be it's own respective column in the said pandas data frame. Any blanks would correspond to np.nan
.
The overall idea here is to import two spreadsheets as data frames and make comparisons between them for differences.
Simply put, how can I turn spreadsheet values into something like {'A' : ['B','C','D'], '1': [2,3,4]}
with code?
CodePudding user response:
Nice question, and thank you for caring about the community in this way! You're right that it makes it a lot easier to help with problems with we have to do as little work to get set up as possible. :)
Unless your spreadsheet has complex formatting/coloring, or complex columns or that sort of stuff, it should be as simple to convert your spreadsheet to a portable datastructure for pd.DataFrame
as
pd.read_excel('path/to/file.xlsx').head(5).to_dict()
Note the .head(5)
- that selects only the first 5 rows, because if your spreadsheet is large, having many rows will likely make everything harder if you don't need to give us those extra rows.