Elegant Way to Swap Doubles With Strings In Pandas Dataframe?-CodePudding

I currently have a data frame with a column titled "Job Code". Job Code can contain one of six hundred different Int values.

Each Int corresponds to a string value that I currently have listed in a notepad, like this:

119 student
120 construction worker
121 baker
122 teacher

I was wondering, is there an elegant way to replace the numbers in the Job Code column with the corresponding strings from my notepad file?

I am currently using the below method, but I am sure there has to be a more optimal solution than typing all 600 codes in jobStrings by hand.

jobStrings = {
    119: "student",
    120: "construction worker",
    121: "baker",
    122: "teacher",
}

df["jobCode"].replace(jobStrings, inplace=True)

CodePudding user response：

Read your text file as CSV and use it to map/replace:

maps = pd.read_csv('textfile.txt', sep='(?<=\d)\s', names=['code', 'name']).set_index('code')['name']

df["jobCode"] = df["jobCode"].map(maps)

NB. This uses space as separator if preceded by a digit. Please let me know if you have digits in the string to find an alternative

Content of maps:

code
119                student
120    construction worker
121                  baker
122                teacher
Name: name, dtype: object

CodePudding user response：

Here's a quick (untested) solution:

with open("your_file.txt") as f:
    text = f.read().strip()
    lines = re.split(r'[\r\n] ', text)
    lines = dict([re.split(r'\s ', l) for l in lines])
    new_dict = {}
    for key in lines.keys():
        new_dict[int(key)] = lines[key]

df["jobCode"].replace(new_dict, inplace=True)