I currently have a data frame with a column titled "Job Code". Job Code can contain one of six hundred different Int values.
Each Int corresponds to a string value that I currently have listed in a notepad, like this:
119 student
120 construction worker
121 baker
122 teacher
I was wondering, is there an elegant way to replace the numbers in the Job Code column with the corresponding strings from my notepad file?
I am currently using the below method, but I am sure there has to be a more optimal solution than typing all 600 codes in jobStrings by hand.
jobStrings = {
119: "student",
120: "construction worker",
121: "baker",
122: "teacher",
}
df["jobCode"].replace(jobStrings, inplace=True)
CodePudding user response:
Read your text file as CSV and use it to map/replace:
maps = pd.read_csv('textfile.txt', sep='(?<=\d)\s', names=['code', 'name']).set_index('code')['name']
df["jobCode"] = df["jobCode"].map(maps)
NB. This uses space as separator if preceded by a digit. Please let me know if you have digits in the string to find an alternative
Content of maps:
code
119 student
120 construction worker
121 baker
122 teacher
Name: name, dtype: object
CodePudding user response:
Here's a quick (untested) solution:
with open("your_file.txt") as f:
text = f.read().strip()
lines = re.split(r'[\r\n] ', text)
lines = dict([re.split(r'\s ', l) for l in lines])
new_dict = {}
for key in lines.keys():
new_dict[int(key)] = lines[key]
df["jobCode"].replace(new_dict, inplace=True)