`join` method importing `other` dataframe values as `NaN`-CodePudding

Editing this to reflect addition work:

Situation

I have 2 pandas dataframes of Twitter search tweets API data in which I have a common data key, author_id.

I'm using the join method.

Code is:

dfTW08 = dfTW07.join(dfTW04uf, on='author_id', how='left', lsuffix='', rsuffix='4')

Results

When I run that, everything comes out as expected, except that all the other dataframe (dfTW04uf) values come in as NaN. Including the values for the other dataframe's author_id column.

Assessment

I'm not getting any error messages, but have to think it's something about the datatypes. The other dataframe is a mix of int64, object, bool, and datetime datatypes. So it seems odd they'd all be unrecognized.

Any suggestions on how to troubleshoot this greatly appreciated.

CodePudding user response：

Couldn't figure out the NaN issue using join, but was able to merge the databases with this:

callingdf.merge(otherdf, on='author_id', how='left', indicator=True)

Then did sort_values and drop_duplicates to get the final list I wanted.

CodePudding user response：

You can use merge instead of join since merge had everything join does but with more "power". (anything you can do with join you can do with merge)

I am assuming the NaN is coming up since the results aren't being discarded when you asked the first join to use on author ID and then include suffixes fo x an y. When you left join with merge you are discarding the non matches without any x and y suffixes.