I have two pandas dataframes ("inscriptions" and "placenames"; in both, one row is one entry, and text and metadata are saved in several different columns). I'm trying to find all inscriptions which contain a placename. To that end, I loop over the placenames ("placenames['name']") and then loop over all inscriptions ("inscriptions['text']), looking for a match.
If a match is found, I would like to copy the entire inscription (the entire row) to a new dataframe (called "matches", which has the same columns as "inscriptions"), and then add some elements from the "placenames" dataframe to it (the matching name, or GPS coordinates, for example).
Using the append-function, this was easily possible:
for idx, elem in enumerate(placenames['names']):
for index, element in enumerate(inscriptions['text']):
if elem in element:
matches = matches.append(inscriptions.iloc[index], ignore_index=True)
matches.at[row, 'place'] = placenames['name'].iloc[idx]
matches.at[row, 'latitude'] = placenames['latitude'].iloc[idx]
However, as the frame.append method is deprecated, I would like to avoid using it, but I simply cannot get it to work using pd.concat() or anything else: whenever I try, the row is not written below, but as an additional column, resulting in a dataframe which has plenty of empty cells and is pretty useless to me.
Does anyone have any idea how I could translate the code above into a non-deprecated variant? Any help is much appreciated, thank you.
CodePudding user response:
Single row is a pandas.Series
which concat()
adds as column but if you convert row to pandas.DataFrame
and transponse then concat()
adds it as row.
It can be something like this:
matches = pd.concat([matches, pd.DataFrame(inscriptions.iloc[index].T] )
EDIT:
Minimal working example (with other changes)
import pandas as pd
placenames = pd.DataFrame({
'name': ['abc', 'xyz'],
'latitude': [1, 2],
})
inscriptions = pd.DataFrame({
'text': ['hello abc', 'bye xyz'],
})
matches = pd.DataFrame({
'text': ['text foo', 'text bar'],
'place': ['foo', 'bar'],
'latitude': [11, 12],
})
for idx_placenames, row_placenames in placenames.iterrows():
for idx_inscriptions, row_inscriptions in inscriptions.iterrows():
if row_placenames['name'] in row_inscriptions['text']:
row_inscriptions['place'] = row_placenames['name']
row_inscriptions['latitude'] = row_placenames['latitude']
#matches = matches.append(row_inscriptions, ignore_index=True)
matches = pd.concat([matches, pd.DataFrame(row_inscriptions).T ])
#matches = matches.reset_index(drop=True)
print(matches)