Home > OS >  Pandas: converting the names of data types to different values using map()
Pandas: converting the names of data types to different values using map()

Time:02-17

I have a Pandas dataframe df:

foo = {
    'Code' : [200, 101, 308, 393],
    'City' : ['New York', 'Los Angeles', 'Miami', 'Houston'],
    'State' : ['New York', 'California', 'Florida', 'Texas'],
    'Country' : ['United States', 'United States', 'United States', 'United States'],
    'Sales' : [100, 200, 300, 400]
}

df = pd.DataFrame(foo)
df

    Code    City         State      Country         Sales
0   200     New York     New York   United States   100
1   101     Los Angeles  California United States   200
2   308     Miami        Florida    United States   300
3   393     Houston      Texas      United States   400

To get the data types, I call:

df.dtypes
    
Code         int64
City         object
State        object
Country      object
Sales        int64
dtype: object

I would like to be able to convert the names of these data types to different names that they can be used in a database schema. To do so, I use the following:

new_types = df.dtypes.map({'int64': 'int', 'object': 'text', 'float64': 'int'})

This returns:

new_types

Code       NaN
City       NaN
State      NaN
Country    NaN
Sales      NaN
dtype: object

What is causing the NaN values when converting using this approach? Is there a more elegant way to do this conversion?

Thanks!

CodePudding user response:

df.dtypes returns a Series where each value is a numpy.dtype object. To get these dtype names as strings and map them, you can cast them to strings with .astype:

dt = df.dtypes

# Confirm the type of these values
print(type(dt[0]))

# Result:
# <class 'numpy.dtype[int64]'>

new_types = dt.astype(str).map({'int64': 'int', 
                                'object': 'text', 
                                'float64': 'int'})

print(new_types)

# Result:
# Code        int
# City       text
# State      text
# Country    text
# Sales       int
# dtype: object

CodePudding user response:

I solved it this by casting the types to str (which I should have done to begin with!):

types = df.dtypes.astype('str')

new_types = types.map({'int64': 'int', 'object': 'text', 'float64': 'int'})

Code        int
City       text
State      text
Country    text
Sales       int
dtype: object

If there is a more elegant way to do this, I'm all ears. Thanks!

CodePudding user response:

You can call the name

d = {'int64': 'int', 'object': 'text', 'float64': 'int'}
df.dtypes.map(lambda x : d.get(x.name))
Out[62]: 
Code        int
City       text
State      text
Country    text
Sales       int
dtype: object
  • Related