When loading data to create a pandas data frame and specifying dtypes in advances with a dictionary, why the keys of this dictionary are not used directly as the names of the columns?
df = pd.read_csv('path/filename',
sep='\t',
index_col=False,
dtype={'col1':str,
'col2':int,
... })
print(df.columns)
Index(["aa",11])
There are many columns and I don't want to put again a long list with the names of the columns.
I want to name the columns as is the order of my dictionary. Is this possible?
CodePudding user response:
You can use the walrus operator :=
to give a name (such as the_dtype
) to the dict you pass as the dtype
argument and reuse it to set the names
argument by calling the_dtype.keys()
:
import pandas as pd
df = pd.read_csv('./foo_dtype_test.txt',
sep='\t',
index_col=False,
dtype=(the_dtype := {'col1':str,
'col2':int}), names=the_dtype.keys())
print(df)
Walrus operator is available in version 3.8; for earlier versions you can do this instead:
the_dtype = {'col1':str,
'col2':int}
df = pd.read_csv('./foo_dtype_test.txt',
sep='\t',
index_col=False,
dtype=the_dtype, names=the_dtype.keys())
Input file foo_dtype_test.txt contains:
1 2
3 4
5 6
Output:
col1 col2
0 1 2
1 3 4
2 5 6
CodePudding user response:
You can use an OrderedDict and extract the keys for the name field (the OrderedDict is useful because you can be guaranteed that the keys-value pairs will be extracted in the order you expect)
import pandas as pd
from collections import OrderedDict
x = OrderedDict({'col1': str, 'col2': int})
df = pd.read_csv('path/filename',
sep='\t',
index_col=False,
dtype = x,
names = x.keys())