Writing pandas/numpy statements in python functions-CodePudding

I am working on multiple data sets with similar data attributes (column names) in jupyter notebook. But it is really tiresome to run all the commands again and again with multiple data sets to achieve the same target. Can anyone let me know if I can automate the process and run this for various data sets. Let's say I'm running following commands for one data set in jupyter notebook:

data = pd.read_csv(r"/d/newfolder/test.csv",low_memory=False) <br>
data.head()

list(data.columns)

data_new=data.sort_values(by='column_name')

Now I'd want to run all the commands saving in one function, for different data sets in the notebook.

Can anyone help me out pls on what are the possible ways? Thanks in advance

CodePudding user response：

IIUC, your issue is that something like print(df) doesn't show as pretty as if you just have df as the last line in a Jupyter cell.

You can have the pretty output whenever you want (as long as your jupyter is updated) by using display!

Modifying your code:

def process_data(file):
    data = pd.read_csv(file, low_memory=False)
    display(data.head())
    display(data.columns)
    data_new = data.sort_values(by='column_name')
    display(data_new.head())


process_data(r"/d/newfolder/test.csv")

This will output data.head(), data.columns, and data_new.head() from a single cell~