Editing this to reflect addition work:
Situation
I have 2 pandas dataframes of Twitter search tweets API data in which I have a common data key, author_id
.
I'm using the join
method.
Code is:
dfTW08 = dfTW07.join(dfTW04uf, on='author_id', how='left', lsuffix='', rsuffix='4')
Results
When I run that, everything comes out as expected, except that all the other
dataframe (dfTW04uf
) values come in as NaN
. Including the values for the other
dataframe's author_id
column.
Assessment
I'm not getting any error messages, but have to think it's something about the datatypes. The other
dataframe is a mix of int64, object, bool, and datetime datatypes. So it seems odd they'd all be unrecognized.
Any suggestions on how to troubleshoot this greatly appreciated.
CodePudding user response:
Couldn't figure out the NaN
issue using join
, but was able to merge
the databases with this:
callingdf.merge(otherdf, on='author_id', how='left', indicator=True)
Then did sort_values
and drop_duplicates
to get the final list I wanted.
CodePudding user response:
You can use merge
instead of join
since merge
had everything join
does but with more "power". (anything you can do with join
you can do with merge
)
I am assuming the NaN
is coming up since the results aren't being discarded when you asked the first join
to use on author ID and then include suffixes fo x an y. When you left join
with merge
you are discarding the non matches without any x and y suffixes.