Problem: Find the ZIP's that are not repeated in df.ZIP (has to occur no more than once) and df.ST does not have values of '.'.
So I subset the original dataframe and applied Groupby - this still brought few rows that didn't meet the subset criteria(df.ST != '.'). So I created a separate df_us by subsetting with copy() option. Groupby still give the same index.
grouped = df[df.ST != '.'].groupby(['ZIP_CD'],sort=False) # grouping
df_size = pd.DataFrame({'ZIP':grouped.size().index, 'Count':grouped.size().values}) # Forming df around the group
df_count = df_size[df_size.Count==1] #df with Count=1
one_index = df_count.index.tolist() #gathering index
df_one = df.loc[one_index] #final df
df_us = df_data[df.ST != '.'].copy() # tried this too
The last code above still gives some index for values of '.' when I groupby. But df_us does not have any '.' at all. So this result in having same index column as above method - but for '.' values, rest of the row values are empty as df_us does not have them!
groupy is finding those index with '.' values no matter what I did. Any solution?
update:
sample data =
index ST ZIP_CD
123 ca 94025
124 Toronto .
125 ga 30306
126 Italy .
127 ca 94025
So correct answer is
ST ZIP_CD
0 123 ca 94025
CodePudding user response:
# use lot where zip_cd ne .
# and zip_cd is not duplicated
(df.loc[df['ZIP'].ne('.') &
~df['ZIP'].isin(df.loc[df.duplicated(subset='ZIP')]['ZIP'])]
)
index ST ZIP_CD
0 123 ca 94025
2 125 ga 30306