Home > Mobile >  How to find list data not in Pandas Dataframe
How to find list data not in Pandas Dataframe

Time:08-30

I have below the dataframe in pandas

import pandas as pd

df = pd.DataFrame([
        ['server1', 'NA', 'NA', '2011-03-31'],
        ['server1', '2011-02-22', 'NA', 'NA'],
        ['server1', 'NA', '2011-06-22', 'NA'],
        ['server2', 'NA', 'NA', '2011-12-31'],
        ['server2', 'NA', '2011-02-21', 'NA'],
        ['server3', 'NA', 'NA', '2011-08-29'],
    ], columns=['hostname', 'patch_date1', 'patch_date2', 'patch_date3'])

print(df) 

  hostname patch_date1 patch_date2 patch_date3
0  server1          NA          NA  2011-03-31
1  server1  2011-02-22          NA          NA
2  server1          NA  2011-06-22          NA
3  server2          NA          NA  2011-12-31
4  server2          NA  2011-02-21          NA
5  server3          NA          NA  2011-08-29


I am trying to search isin and not in, isin working what i am looking, but the same unable to get not in from the dataframe based on the list .

list_to_search = ['server1','server2','server3','server4','server5']
df = df[df.hostname.isin(list_to_search)]

print(df)

  hostname patch_date1 patch_date2 patch_date3
0  server1          NA          NA  2011-03-31
1  server1  2011-02-22          NA          NA
2  server1          NA  2011-06-22          NA
3  server2          NA          NA  2011-12-31
4  server2          NA  2011-02-21          NA
5  server3          NA          NA  2011-08-29

Actual below the output i need like below, which mean what are the data are not in the list_to_search that i need to print as dataframe with NA

  hostname patch_date1 patch_date2 patch_date3
0  server4          NA          NA          NA
1  server5          NA          NA          NA

CodePudding user response:

You can do this:

servers_not_found = [server for server in list_to_search if server
                     not in df['hostname'].tolist()]
out = pd.concat([pd.DataFrame({'hostname': servers_not_found}),
                pd.DataFrame(columns=df.columns.drop('hostname'))],
                axis=1)

print(out):

  hostname patch_date1 patch_date2 patch_date3
0  server4         NaN         NaN         NaN
1  server5         NaN         NaN         NaN
  • Related