I have data that looks like;
ID File
1 this_file_whatever.ext1
2 this_whatever.ext2
3 this_is_ok_pooh.ext3
I am trying to get the extension and put the key from a dict in a new col based on the extension in File
.
def create_filegroups(row):
filegroup_dict = {
'GroupA': 'ext1',
'GroupB': 'ext2',
'GroupC': 'ext3'
}
if '.' in row['Name']:
test = row['Name'].split(".",1)[1]
return test
DF = build_df()
DF['COL3'] = DF.apply(create_filegroups(row), axis=1)
print(DF)
I can't figure out what I am doing wrong. The dict compare I can do when I get there, but I can't seem to apply a function to the cells.
CodePudding user response:
I believe you need pandas.Series.map
after extracting the file extension from the column File
.
Try this:
df['COL3']= (
df['File']
.str.extract(r'\w \.(\w )', expand=False)
.map({k:v for v,k in filegroup_dict.items()})
)
# Output :
print(df)
ID File COL3
0 1 this_file_whatever.ext1 GroupA
1 2 this_whatever.ext2 GroupB
2 3 this_is_ok_pooh.ext3 GroupC