Home > OS >  Doesn't the option dtype (pandas) add the names of the columns directly from the keys of this d
Doesn't the option dtype (pandas) add the names of the columns directly from the keys of this d

Time:04-18

When loading data to create a pandas data frame and specifying dtypes in advances with a dictionary, why the keys of this dictionary are not used directly as the names of the columns?

df = pd.read_csv('path/filename',
                 sep='\t', 
                 index_col=False,
                 dtype={'col1':str,
                        'col2':int,
                         ... })
print(df.columns)
Index(["aa",11])

There are many columns and I don't want to put again a long list with the names of the columns.

I want to name the columns as is the order of my dictionary. Is this possible?

CodePudding user response:

You can use the walrus operator := to give a name (such as the_dtype) to the dict you pass as the dtype argument and reuse it to set the names argument by calling the_dtype.keys():

import pandas as pd
df = pd.read_csv('./foo_dtype_test.txt',
                 sep='\t', 
                 index_col=False,
                 dtype=(the_dtype := {'col1':str,
                        'col2':int}), names=the_dtype.keys())
print(df)

Walrus operator is available in version 3.8; for earlier versions you can do this instead:

the_dtype = {'col1':str,
    'col2':int}
df = pd.read_csv('./foo_dtype_test.txt',
                 sep='\t', 
                 index_col=False,
                 dtype=the_dtype, names=the_dtype.keys())

Input file foo_dtype_test.txt contains:

1   2
3   4
5   6

Output:

  col1  col2
0    1     2
1    3     4
2    5     6

CodePudding user response:

You can use an OrderedDict and extract the keys for the name field (the OrderedDict is useful because you can be guaranteed that the keys-value pairs will be extracted in the order you expect)

import pandas as pd
from collections import OrderedDict

x = OrderedDict({'col1': str, 'col2': int})
          
df = pd.read_csv('path/filename',
                 sep='\t', 
                 index_col=False,
                 dtype = x, 
                 names = x.keys())


  • Related