I want to compare two lists, row by row. If 2 rows are equal, add just one of the 2 rows to a new dataframe. But if not, add both rows to the new dataframe.
These are my both lists:
original = data_Unos['text']
13 Speaking to Africa Review he also pointed ou...
17 Through Gawad Kalinga Meloto has proven to b...
21 May you attain Nibbana Sena thank you so muc...
22 Dodgeballs were flying fast and hard at Mornin...
26 Most are from desperately poor Horn of Africa ...
...
3155 The statement signed by Ikonomwan Francis le...
3159 Most of them the homeless have the abili...
3162 In Metro Manila 7 464 families of disabled...
3163 We are working with an aim to build a countr...
3172 Our hearts go out to the hundreds of thousands...
Name: text, Length: 794, dtype: object
And:
backTranslated = backTranslated['text']
backTranslated
0 Talking to Africa Review also noted that most ...
1 Through Gawad Kalinga Meloto has proven to be ...
2 May you reach Nibbana Sena thank you so much f...
3 Dodgeballs were flying fast and hard at Mornin...
4 Most of them are from poor countries in the Ho...
...
789 The declaration signed by Ikonomwan Francis le...
790 Most of them homeless have the ability to work...
791 In Metro Manila 7 464 families of disabled cyc...
792 We are working with the objective of building ...
793 Our hearts are directed to the hundreds of tho...
Name: text, Length: 794, dtype: object
And this is what I'm trying to do:
final = pd.DataFrame()
for i in original:
for j in backTranslated:
if(set(i)!=set(j)):
final = final.append(i,ignore_index=True)
final = final.append(j,ignore_index=True)
else:
final = final.append(i,ignore_index=True)
But the following error appears in this line:
final = final.append(j,ignore_index=True)
TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid
How can I do that?
CodePudding user response:
The easiest way is to append both of them and remove duplicates:
final = data_Unos.append(backTranslated)
final.drop_duplicates(subset=['text'], inplace=True)
Iterating in Pandas should be last resource
CodePudding user response:
pandas.DataFrame.append
method is deprecated since 1.4.0, The alternative is to use pandas.concat
method.
This is how pandas.concat method is defined
pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
The parameter objs
, here needs to be either a Series or Dataframe objects. So the correct way to do it in your code is
final = pd.DataFrame()
for i in original:
for j in backTranslated:
series_i = pd.Series(i)
if(set(i)!=set(j)):
series_j = pd.Series(j)
final = pd.concat((final, series_i, series_j), ignore_index=True)
else:
final = pd.concat((final, series_i), ignore_index=True)
Furthermore you can define the column name via the key
parameter.