Home > Software engineering >  Replace values in columns of dataframe based on dictionary not working [duplicate]
Replace values in columns of dataframe based on dictionary not working [duplicate]

Time:09-22

You can read the exact problem below, but this is essentially what I'm trying to do:

df1 = pd.DataFrame({'A':['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3'],
                        'C': ['C0', 'C1', 'C2', 'C3'],
                        'D': ['D0', 'D1', 'D2', 'D3']})

newVals = dict({'A0': 0,
             'A1': 1,
              'A2': 2,
             'A3': 3})
for key, value in newVals.items():
    df1['A'].replace({key, value})

And when I do this, the resulting data frame has no change.

Initial Post:

Ok so the data I am analyzing accidents in the workplace from OSHA (osha_accident_injury.csv). Each row is a particular person who was injured in an accident. Each column is a characteristic of the person or the accident itself. And each characteristic is encoded as an integer that has a corresponding string value. I want to replace each integer with its string definition. The mappings of numbers to strings are listed in osha_accident_lookup.csv. The mappings of accident codes can be found in osha_accident_dictionary.csv, but I manually input them into a map.

However, some of the integers map to multiple strings, so it also depends on the accident_code from osha_accident_lookup.csv. Because of this, I create a list that holds a dictionary (maps integer to string value) for each particular accident code. However, when I try to replace each column with its particular dictionary, it returns the original dataframe instead of the one with string values. Can anyone see what I am doing wrong?

# create list of all distinct accident codes
code_list = []
for index in osha_accident_lookup.index:
    if osha_accident_lookup['accident_code'][index] not in code_list:
        code_list.append(osha_accident_lookup['accident_code'][index])

# remove values not found in actual data
code_list.remove('PTYP')
code_list.remove('COST')
code_list.remove('ENDU')

# create list of dictionaries, s.t. each item maps accident number to accident value
# there is a unique map for each unique accident code
mapList = []
for code in code_list:
    temp_df = pd.DataFrame(osha_accident_lookup[osha_accident_lookup['accident_code'] == code])
    temp_map = dict(zip(temp_df['accident_number'], temp_df['accident_value']))
    mapList.append(temp_map)

# create dictionary that maps code from osha_accident_lookup to column name in osha_accident_injury.csv
code_to_column = dict({"OCC": "occ_code", 'CAUS': 'fat_cause', 'DEGR': 'degree_of_inj',
                          "OPER": "const_op_cause", "EN": 'evn_factor', "FT": 'event_type', "HU": 'hum_factor', "IN":
                           "nature_of_inj", "BD": "part_of_body", "SO": "src_of_injury", "TASK": 'task_assigned'})

# replace numbers in injury data with string values of what the #'s represent
iterator = 0
for item in mapList:
    code = code_list[iterator]
    col_name = code_to_column[code]
    for key, value in item.items():
        osha_accident_injury[col_name].replace({key: value})
    iterator  = 1

osha_accident_injury.csv (first 10 rows):

FIELD1 summary_nr rel_insp_nr age sex nature_of_inj part_of_body src_of_injury event_type evn_factor hum_factor occ_code degree_of_inj task_assigned hazsub const_op const_op_cause fat_cause fall_distance fall_ht injury_line_nr load_dt
0 18 10006732 0 10.0 12.0 15.0 13.0 18.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 1 2017-03-20 01:00:11 EDT
1 26 159996 0 21.0 19.0 42.0 5.0 13.0 9.0 0.0 1.0 1.0 0.0 0.0 0.0 1 2017-03-20 01:00:11 EDT
2 34 10013225 0 21.0 4.0 19.0 8.0 18.0 1.0 0.0 1.0 1.0 0270 0.0 0.0 0.0 1 2017-03-20 01:00:11 EDT
3 42 10014439 0 1.0 10.0 24.0 2.0 3.0 1.0 0.0 2.0 2.0 0.0 0.0 0.0 1 2017-03-20 01:00:11 EDT
4 59 19523588 0 5.0 4.0 16.0 10.0 9.0 1.0 0.0 2.0 1.0 0.0 0.0 0.0 1 2017-03-20 01:00:11 EDT
5 59 19523588 0 21.0 5.0 16.0 8.0 9.0 14.0 0.0 2.0 2.0 0.0 0.0 0.0 2 2017-03-20 01:00:11 EDT
6 59 19523588 0 21.0 5.0 16.0 6.0 9.0 14.0 0.0 2.0 2.0 0.0 0.0 0.0 3 2017-03-20 01:00:11 EDT
7 59 19523588 0 21.0 5.0 16.0 8.0 9.0 14.0 0.0 2.0 2.0 0.0 0.0 0.0 4 2017-03-20 01:00:11 EDT
8 59 19523588 0 21.0 5.0 16.0 8.0 9.0 14.0 0.0 2.0 2.0 0.0 0.0 0.0 5 2017-03-20 01:00:11 EDT
9 59 19523588 0 21.0 5.0 16.0 8.0 9.0 14.0 0.0 2.0 2.0 0.0 0.0 0.0 6 2017-03-20 01:00:11 EDT

osha_accident_lookup.csv (first 10 rows):

accident_code accident_number accident_value accident_letter load_date
OPER 1 Backfilling and compacting 2018-11-09 20:56:02 EST
OPER 2 Bituminous concrete placement 2018-11-09 20:56:02 EST
OPER 3 Construction of playing fields, tennis courts 2018-11-09 20:56:02 EST
SO 1 AIRCRAFT 2018-11-09 20:56:02 EST
SO 2 AIR PRESSURE 2018-11-09 20:56:02 EST
SO 3 ANIMAL/INS/REPT/ETC. 2018-11-09 20:56:02 EST
OCC 757 Separating, filtering & clarifying mach. operators 2018-11-09 20:56:02 EST
OCC 758 Compressing and compacting machine operators 2018-11-09 20:56:02 EST
OCC 759 Painting and paint spraying machine operators 2018-11-09 20:56:02 EST
OCC 763 Roasting and baking machine operators, food 2018-11-09 20:56:02 EST

osha_data_dictionary.csv (first 10 rows):

table_name column_name attribute_name definition column_datatype display_name
osha_accident nonbuild_ht Non Building Height Construction - height in feet when not a building Numeric, Length=4 Height for Non-Building
osha_accident project_type Project Type Construction - project type (code table PTYP) Alphanumeric, Length:1 Project Type
osha_accident event_date Event Date Date of accident (yyyymmdd) Numeric, Length=8 Event Date
osha_accident event_keyword Event Keyword Contains comma separated keywords entered by ERG during the review process. Alphanumeric, Length:200 Event Keyword
osha_accident report_id Report ID Identifies the OSHA federal or state reporting jurisdiction Numeric, Length=7 Reporting ID
osha_accident event_desc Event Description Short description of event Alphanumeric, Length:60 Event Description
osha_accident load_dt Load Date Timestamp The date the load was completed. date No Label
osha_accident summary_nr Summary NR Identifies the accident OSHA-170 form Numeric, Length=9 Summary NR
osha_accident fatality Fatality X=Fatality is associated with accident Alphanumeric, Length:1 Fatality

CodePudding user response:

Try this method based on your example.

df1['A'] = df1['A'].map(newVals)
  • Related