Home > database >  Making all the values in a column one media (NY --> New York)
Making all the values in a column one media (NY --> New York)

Time:09-14

Imagine I have a dataset that is like so:

         cust_id       cust_state  prod_title            total_sales    trans_timestamp
0        1001015       NY          All Veggie Yummies    72.99           2021-03-30 10:06:05.108653 
1        1001019       NJ          Ball and String       18.95           2021-03-30 10:07:01.746945 
2        1022098       NY          Cat Cave              28.45           2021-03-30 10:10:41.387170    
3        1022157       PA          Chewie Dental         24.95           2021-03-30 10:20:11.571311 
4        1022189       Illinois    Chomp-a Plush         60.99           2021-03-30 10:33:11.289467 
5        1002664       Kansas      Feline Fix Mix        65.99           2021-03-30 10:37:55.446798      
6        1002666       Florida     Fetch Blaster         9.95            2021-03-30 10:39:36.488829 
7        1002175       Hawaii      Foozy Mouse           45.99           2021-03-30 10:43:21.490817
8        1002666       Florida     Kitty Climber         35.99           2021-03-30 10:51:31.510563      
9        1022189       Illinois    Purr Mix              32.99           2021-03-30 11:01:50.082440 
10       1011924       Indiana     Fetch Blaster         19.90           2021-03-30 11:11:15.944726 
11       1022236       AK          Purr Mix              98.97           2021-03-30 11:15:39.390134     
12       1022189       IL          Cat Cave              56.90           2021-03-30 11:21:10.167505 
13       1002137       Wisconsin   Purrfect Puree        54.95           2021-03-30 11:27:51.133318 
14       1002159       KY          Foozy Mouse           91.98           2021-03-30 11:29:00.292890 
15       1002175       Hawaii      Reddy Beddy           21.95           2021-03-30 11:30:22.928818      
16       1002108       Maine       Cat Cave              85.83           2021-03-30 11:34:35.776578 
17       1002186       MI          Scratchy Post         48.95           2021-03-30 10:39:28.604007 
18       1002175       HI          Snack-em Fish         15.99           2021-03-30 10:39:40.349882      
19       1002261       CO          Snoozer Essentails    99.95           2021-03-30 10:51:31.510563 
20       1002666       Florida     Scratchy Post         48.95           2021-03-30 11:01:28.024109 
21       1002159       Kentucky    Purrfect Puree        219.80          2021-03-30 11:11:15.944726     
22       1002680       SC          Chewie Dental         49.90           2021-03-30 11:15:39.390134 
23       1002678       Texas       Reddy Beddy           65.85           2021-03-30 11:21:10.167505 
24       1013769       OK          The New Bone          71.96           2021-03-30 11:27:17.147159      
25       1013859       KY          Reddy Beddy           109.75          2021-03-30 11:28:48.669564 

Fix the inconsistencies in the column cust_state.

So for this one I created a dictionary for all the abbreviations to the actual state names, then I applied it as a map to the actual cust_state column. However when I did that it changed all the abbreviations to NaN values. How do I change all the states values to one style? I'll post my code beneath.

state_to_abr = dict({'AL':'Alabama','AK':'Alaska','AZ':'Arizona','AR':'Arkansas',
'CA':'California','CO':'Colorado','CT':'Connecticut',
'DE':'Delaware',
'FL':'Florida',
'GA':'Georgia',
'HI':'Hawaii',
'ID':'Idaho','IL':'Illinois','IN':'Indiana','IA':'Iowa',
'KS':'Kansas','KY':'Kentucky',
'LA':'Louisiana',
'ME':'Maine','MD':'Maryland','MA':'Massachusetts','MI':'Michigan','MN':'Minnesota','MS':'Mississippi','MO':'Missouri','MT':'Montana',
'NE':'Nebraska','NV':'Nevada','NH':'New Hampshire','NJ':'New Jersey','NM':'New Mexico','NY':'New York','NC':'North Carolina','ND':'North Dakota',
'OH':'Ohio','OK':'Oklahoma','OR':'Oregon',
'PA':'Pennsylvania',
'RI':'Rhode Island',
'South Carolina':'SC',
'TN':'Tennessee','TX':'Texas',
'UT':'Utah',
'VT':'Vermont','VI':'Virgin Islands',
'WA':'Washington','WV':'West Virginia','WI':'Wisconsin','WY':'Wyoming'})
df_cleaned['cust_state'] = df_cleaned['cust_state'].map(state_to_abr)
df_cleaned

CodePudding user response:

This will let you map to the dictionary you wrote (I had to change the order of SC though)

state_to_abr = dict({'AL':'Alabama','AK':'Alaska','AZ':'Arizona','AR':'Arkansas',
'CA':'California','CO':'Colorado','CT':'Connecticut',
'DE':'Delaware',
'FL':'Florida',
'GA':'Georgia',
'HI':'Hawaii',
'ID':'Idaho','IL':'Illinois','IN':'Indiana','IA':'Iowa',
'KS':'Kansas','KY':'Kentucky',
'LA':'Louisiana',
'ME':'Maine','MD':'Maryland','MA':'Massachusetts','MI':'Michigan','MN':'Minnesota','MS':'Mississippi','MO':'Missouri','MT':'Montana',
'NE':'Nebraska','NV':'Nevada','NH':'New Hampshire','NJ':'New Jersey','NM':'New Mexico','NY':'New York','NC':'North Carolina','ND':'North Dakota',
'OH':'Ohio','OK':'Oklahoma','OR':'Oregon',
'PA':'Pennsylvania',
'RI':'Rhode Island',
'SC':'South Carolina',
'TN':'Tennessee','TX':'Texas',
'UT':'Utah',
'VT':'Vermont','VI':'Virgin Islands',
'WA':'Washington','WV':'West Virginia','WI':'Wisconsin','WY':'Wyoming'})

df_states = pd.DataFrame.from_dict(state_to_abr,orient='index', columns = ['State_Name'])

df['cust_state'] = np.where(df['cust_state'].str.len() <= 2, df['cust_state'].map(df_states['State_Name']), df['cust_state']) 
df

CodePudding user response:

you seemed to have a spaces in the abbreviation, so stripping the spaces will yield the result. secondly, since the cust_states has both abbreviations and full name, you may want to map when the mapping doesn't result in a null values, hence the use of the mask

df['cust_state']=df['cust_state'].mask(
    df['cust_state'].str.strip().map(state_to_abr).notna(), 
    df['cust_state'].str.strip().map(state_to_abr))

df

OR

check the length, when its two, do a map, else leave value as is

df['cust_state']=df['cust_state'].mask(
    df['cust_state'].str.strip().str.len()==2, 
    df['cust_state'].str.strip().map(state_to_abr))
df
cust_id     cust_state  prod_title  total_sales     trans_timestamp
0   1001015     New York    All Veggie Yummies  72.99   2021-03-30 10:06:05.108653
1   1001019     New Jersey  Ball and String     18.95   2021-03-30 10:07:01.746945
2   1022098     New York    Cat Cave    28.45   2021-03-30 10:10:41.387170
3   1022157     Pennsylvania    Chewie Dental   24.95   2021-03-30 10:20:11.571311
4   1022189     Illinois    Chomp-a Plush   60.99   2021-03-30 10:33:11.289467
5   1002664     Kansas  Feline Fix Mix  65.99   2021-03-30 10:37:55.446798
6   1002666     Florida     Fetch Blaster   9.95    2021-03-30 10:39:36.488829
7   1002175     Hawaii  Foozy Mouse     45.99   2021-03-30 10:43:21.490817
8   1002666     Florida     Kitty Climber   35.99   2021-03-30 10:51:31.510563
9   1022189     Illinois    Purr Mix    32.99   2021-03-30 11:01:50.082440
10  1011924     Indiana     Fetch Blaster   19.90   2021-03-30 11:11:15.944726
11  1022236     Alaska  Purr Mix    98.97   2021-03-30 11:15:39.390134
12  1022189     Illinois    Cat Cave    56.90   2021-03-30 11:21:10.167505
13  1002137     Wisconsin   Purrfect Puree  54.95   2021-03-30 11:27:51.133318
14  1002159     Kentucky    Foozy Mouse     91.98   2021-03-30 11:29:00.292890
15  1002175     Hawaii  Reddy Beddy     21.95   2021-03-30 11:30:22.928818
16  1002108     Maine   Cat Cave    85.83   2021-03-30 11:34:35.776578
17  1002186     Michigan    Scratchy Post   48.95   2021-03-30 10:39:28.604007
18  1002175     Hawaii  Snack-em Fish   15.99   2021-03-30 10:39:40.349882
19  1002261     Colorado    Snoozer Essentails  99.95   2021-03-30 10:51:31.510563
20  1002666     Florida     Scratchy Post   48.95   2021-03-30 11:01:28.024109
21  1002159     Kentucky    Purrfect Puree  219.80  2021-03-30 11:11:15.944726
22  1002680     SC  Chewie Dental   49.90   2021-03-30 11:15:39.390134
23  1002678     Texas   Reddy Beddy     65.85   2021-03-30 11:21:10.167505
24  1013769     Oklahoma    The New Bone    71.96   2021-03-30 11:27:17.147159
25  1013859     Kentucky    Reddy Beddy     109.75  2021-03-30 11:28:48.669564
  • Related