Home > Software design >  `join` method importing `other` dataframe values as `NaN`
`join` method importing `other` dataframe values as `NaN`

Time:04-19

Editing this to reflect addition work:

Situation

I have 2 pandas dataframes of Twitter search tweets API data in which I have a common data key, author_id.

I'm using the join method.

Code is:

dfTW08 = dfTW07.join(dfTW04uf, on='author_id', how='left', lsuffix='', rsuffix='4')

Results

When I run that, everything comes out as expected, except that all the other dataframe (dfTW04uf) values come in as NaN. Including the values for the other dataframe's author_id column.

Assessment

I'm not getting any error messages, but have to think it's something about the datatypes. The other dataframe is a mix of int64, object, bool, and datetime datatypes. So it seems odd they'd all be unrecognized.

Any suggestions on how to troubleshoot this greatly appreciated.

CodePudding user response:

Couldn't figure out the NaN issue using join, but was able to merge the databases with this:

callingdf.merge(otherdf, on='author_id', how='left', indicator=True)

Then did sort_values and drop_duplicates to get the final list I wanted.

CodePudding user response:

You can use merge instead of join since merge had everything join does but with more "power". (anything you can do with join you can do with merge)

I am assuming the NaN is coming up since the results aren't being discarded when you asked the first join to use on author ID and then include suffixes fo x an y. When you left join with merge you are discarding the non matches without any x and y suffixes.

  • Related