Home > Mobile >  Pandas Groupby having index that are not in the dataframe derived by subseting with Copy()
Pandas Groupby having index that are not in the dataframe derived by subseting with Copy()

Time:10-19

Problem: Find the ZIP's that are not repeated in df.ZIP (has to occur no more than once) and df.ST does not have values of '.'.
So I subset the original dataframe and applied Groupby - this still brought few rows that didn't meet the subset criteria(df.ST != '.'). So I created a separate df_us by subsetting with copy() option. Groupby still give the same index.

grouped = df[df.ST != '.'].groupby(['ZIP_CD'],sort=False) # grouping
df_size = pd.DataFrame({'ZIP':grouped.size().index, 'Count':grouped.size().values}) # Forming df around the group
df_count = df_size[df_size.Count==1] #df with Count=1
one_index = df_count.index.tolist() #gathering index
df_one = df.loc[one_index] #final df

df_us = df_data[df.ST != '.'].copy() # tried this too

The last code above still gives some index for values of '.' when I groupby. But df_us does not have any '.' at all. So this result in having same index column as above method - but for '.' values, rest of the row values are empty as df_us does not have them!

groupy is finding those index with '.' values no matter what I did. Any solution?

update: sample data =
index ST ZIP_CD
123 ca 94025
124 Toronto .
125 ga 30306
126 Italy .
127 ca 94025

So correct answer is

    ST      ZIP_CD 
0   123     ca  94025

CodePudding user response:

# use lot where zip_cd ne .
# and zip_cd is not duplicated

(df.loc[df['ZIP'].ne('.') & 
~df['ZIP'].isin(df.loc[df.duplicated(subset='ZIP')]['ZIP'])]
 )

    index   ST  ZIP_CD
0   123     ca  94025
2   125     ga  30306
  • Related