I have a problem converting the values of a row of one dataframe into a row in another dataframe. The column values of the columns by which i compare the rows are different.
here an example:
dfzwei = pd.DataFrame([
{"name": 'web' , "b":10, "c": 2,"d": 21},
{"name":' app', "b":77, "c": 4,"d": 12},
{"name":'user' , "b":56, "c": 20,"d": 40},
{"name":'code', "b":44, "c": 8,"d": 70},
{"name":'this', "b":44, "c": 8,"d": 70},
{"name":'well', "b":44, "c": 8,"d": 70}
])
df = pd.DataFrame([
{"file":'bin\main\src\user.java', "b":10, "c": 0, "d": 99},
{"file":'bin\main\src\web.java', "b":12, "c": 0, "d": 80},
{"file":'bin\main\src\code.java', "b":16, "c": 1, "d": 90},
{"file":'bin\main\src\app.cs', "b":18, "c": 10, "d": 33}
])
df2
i want to transmit the value of df2.b to df.b. i have tried it like this:
for line, row in enumerate(df.itertuples(), 1):
for line2, row2 in enumerate(dfzwei.itertuples(), 1):
if row2.name in row.file :
df.at[row.Index, 'b'] = dfzwei.at[row2.Index, 'b']
df.at[row.Index, 'c'] = dfzwei.at[row2.Index, 'c']
df.at[row.Index, 'd'] = dfzwei.at[row2.Index, 'd']
i need to make sure that the row2 of dfzwei is really in df. i cant be sure of that. the indexes of the two dataframes are not the same too.
thats why i do the "if row2.name in row.file"
when i do it with big dataframes the cell values get randomly tangled up, only some are right. i would be very very glad to have a solutions for this, thank you very much for any hints.
EDIT
My mistake was to asume, that name occurs only once in the df.file column. i was iterating over the filpaths (file) in df and trying to match them with the classnames (name) in dfzwei. the issue was there were very similiar classnames . for example in df :
df = pd.DataFrame([
{"file":'bin\main\src\userw6.java', "b":10, "c": 0, "d": 99},
{"file":'bin\main\src\webapp.py', "b":12, "c": 0, "d": 80},
{"file":'bin\main\src\code.cs', "b":16, "c": 1, "d": 90},
{"file":'bin\main\src\app.java', "b":18, "c": 10, "d": 33}
])
so dfzwei had for example these classnames:
dfzwei = pd.DataFrame([
{"name": 'web' , "b":10, "c": 2,"d": 21},
{"name":' app', "b":77, "c": 4,"d": 12},
{"name":'user' , "b":56, "c": 20,"d": 40},
{"name":'w6', "b":44, "c": 8,"d": 70},
{"name":'code7', "b":44, "c": 8,"d": 70},
{"name":'well', "b":44, "c": 8,"d": 70}
])
so i was matching multiple classpaths in df with
if row2.name in row.file :
so my solution for this lies in making sure the right name fits in the right filepath. So how do i get the name in the file separated between the slashsign and '.' so i can compare the content with dfzwei.name?
CodePudding user response:
You can create a dict mapping:
df['b'] = df['class'].map(df2.set_index(df2['file'].str.rsplit('/', n=1).str[1])['b'])
print(df)
# Output
class b
0 web 12
1 app 18
2 user 10
3 code 16
CodePudding user response:
If anyone comes across this too: my solution finally was:
for line, row in enumerate(df.itertuples(), 1):
str = df.at[row.Index, 'file'][df.at[row.Index, 'file'].rindex('\\') 1:]
str = re.search('(.*).java',str ).group(1)
for line2, row2 in enumerate(dfzwei.itertuples(), 1):
if row2.name == str :
df.at[row.Index, 'b'] = dfzwei.at[row2.Index, 'b']
df.at[row.Index, 'c'] = dfzwei.at[row2.Index, 'c']
df.at[row.Index, 'd'] = dfzwei.at[row2.Index, 'd']