Please examine the commented text in the code below in order to understand the problem.
import pandas as pd
import numpy as np
primary = pd.DataFrame(
data = ['little','mary','had','a','swan'],
index =pd.DatetimeIndex(['2015-09-25 12:00:00',
'2015-09-25 13:00:00',
'2015-09-25 14:00:00',
'2015-09-25 15:00:00',
'2015-09-25 16:00:00']),
columns=['some_nonsense'])
secondary = pd.DataFrame(
data = ['mommy',np.nan],
index =pd.DatetimeIndex(['2015-09-25 14:00:00',
'2015-09-25 15:00:00']),
columns=['copy_me'])
# 1. secondary dataframe values have already been computed
# 2. we want to assign them to the primary dataframe for available dates
# 3. once done, we want to return dataframe index locations for missing values
# 4. nan is one of the valid values the secondary dataframe can take
primary['copy_me'] = secondary['copy_me']
print (secondary)
print (primary)
# The values have been copied successfully
# But how to get the locations of missing indices?
# The expected result is as follows:
# If I know these values I could pass them to my computing function
missing_indices = np.array([0,1,4])
print('needed result: ', missing_indices)
CodePudding user response:
If I understand correctly, this might help:
(~primary.index.isin(secondary.index)).nonzero()[0]
Breakdown:
- Find which
primary
indixes are present insecondary
(primary.index.isin(secondary.index)
). - Negate that condition (
~
). - Find positions where value is non-zero, meaning
True
, usingnumpy.nonzero
. (.nonzero()[0]
,[0]
because it returns a tuple)
CodePudding user response:
You can just check if primary.index
is in secondary.index
:
np.flatnonzero(~primary.index.isin(secondary.index))
# array([0, 1, 4], dtype=int32)