Home > Software design >  Creating Multi Indexed Dataframe with Its Name - Pandas/Python
Creating Multi Indexed Dataframe with Its Name - Pandas/Python

Time:11-30

Assuming I have this dataframe saved in my directory:

import numpy as np
import pandas as pd

col1 = pd.Series(np.linspace(1, 10, 20))
col2 = pd.Series(np.linspace(11, 20, 20))
data = np.array([col1, col2]).T
df = pd.DataFrame(data, columns = ["col1", "col2"])
df.to_csv("test.csv", index = False)

What I would like to do is to read this file and the name of the file as a column on top of the other columns to get something like this:

example

How can I do this?

CodePudding user response:

Use MultiIndex.from_product:

file = 'test.csv'

df = pd.read_csv(file)
name = file.split('.')[0].capitalize()

df.columns= pd.MultiIndex.from_product([[name],df.columns])
print (df.head())
       Test           
       col1       col2
0  1.000000  11.000000
1  1.473684  11.473684
2  1.947368  11.947368
3  2.421053  12.421053
4  2.894737  12.894737

CodePudding user response:

Use pathlib to extract the file name using .stem and pd.concat to create a multi level column:

import pathlib

filename = pathlib.Path('path/to/test.csv')

df = pd.concat({filename.stem.capitalize(): pd.read_csv(filename)}, axis=1)
print(df)

# Output:
         Test           
         col1       col2
0    1.000000  11.000000
1    1.473684  11.473684
2    1.947368  11.947368
3    2.421053  12.421053
4    2.894737  12.894737
5    3.368421  13.368421
6    3.842105  13.842105
7    4.315789  14.315789
8    4.789474  14.789474
9    5.263158  15.263158
10   5.736842  15.736842
11   6.210526  16.210526
12   6.684211  16.684211
13   7.157895  17.157895
14   7.631579  17.631579
15   8.105263  18.105263
16   8.578947  18.578947
17   9.052632  19.052632
18   9.526316  19.526316
19  10.000000  20.000000
  • Related