Home > Blockchain >  How to Pythonically add new cells Jupyter Notebook
How to Pythonically add new cells Jupyter Notebook

Time:03-20

I have a lot of different files that I'm trying to load to pandas in a pythonic way but also to add to different cells to make this look easy. Now I have 36 different variables but to make things easy, I'll show you an example with three different dataframes.

screenshot

But let's say I'm uploading CSV files with this into dataframes but in different cells, automatically generated.

file_list = ['df1.csv', 'df2.csv', 'df3.csv']
name_list = ['df1', 'df2', 'df3']

I could easy create three different cells and type:

df1 = pd.read_csv('df1.csv')

But there are dozens of different CSVs and I want to do similar things like delete columns and there have to be easier ways.

I've done something such as:

var_list = []

for file, name in zip(file_list, name_list):
    var_name = name
    var_file = pd.read_csv(file)
    var_list.append((file, name, var_file))

print(var_list)

But this all occurs in the same cell.

Now I looked at the ipython docs, as this is the package I believe has to do with this, but I couldn't find anything. I appreciate your help.

CodePudding user response:

From what I understand, you need to load the content of several .csv files into several pandas dataframes, plus, you want to execute a repeatable process for each of them. You're not sure they will be loaded correctly, but you still want to be able to get the max out of them, and to this end you want to run each process in its own Jupyter cell.

As pointed out by ddejohn, I don't know if that's the best option, but anyway, I think it's a cool question. Next code generates several cells, each of them having a common structure with different variables (in my example, I simply sort the loaded dataframe by age, as an example). It is based on How to programmatically create several new cells in a Jupyter notebook page, which should get the credit, if it is indeed what you were looking for:

from IPython.core.getipython import get_ipython
import pandas as pd

def create_new_cell(contents):
    shell = get_ipython()
    payload = dict(
        source='set_next_input',
        text=contents,
        replace=False,
    )
    shell.payload_manager.write_payload(payload, single=False)

def get_df(file_name, df_name):
    content = "{df} = pd.read_csv('{file}', names=['Name', 'Age', 'Height'])\n"\
               "{df}.sort_values(by='Age', inplace=True)\n"\
               "{df}"\
               .format(df=df_name, file=file_name)
    create_new_cell(content)

file_list = ['filename_1.csv', 'filename_2.csv']
name_list = ['df1', 'df2']
for file, name in zip(file_list, name_list):
    get_df(file, name)
  • Related