Home > database >  How to divide datasets in Pandas?
How to divide datasets in Pandas?

Time:06-06

firstpart = D2.loc[(D2['Age'] == "15") 
                  | (D2['Age'] == "16")
                  | (D2['City'] == "Paris")
                  | (D2['City'] == "London")
                  | (D2['City'] == "Istanbul")
                  | (D2['Health'] == "Ok")
                  ]

This is how I got what I wanted from the dataset but I would like to take the rest of the dataset and save it as a new dataset. Does pandas have some functions to do this easily?

CodePudding user response:

here is my sample code. You can check it

import pandas as pd

df = pd.DataFrame()
df['Age'] = ["16","17","18"]
df['City'] = ['Paris']*2   ["London"]
df['Health'] = ['OK']*2   ['Not Ok']

mask = (df['Age'] == "16") | (df['City'] == "Paris") | (df['Health'] == 'OK')
print(df[mask])
print(df[~mask])

The result is

  Age   City Health
0  16  Paris     OK
1  17  Paris     OK

and

  Age    City  Health
2  18  London  Not Ok

CodePudding user response:

You can split your code into

mask = (D2['Age'] == "15") 
                  | (D2['Age'] == "16")
                  | (D2['City'] == "Paris")
                  | (D2['City'] == "London")
                  | (D2['City'] == "Istanbul")
                  | (D2['Health'] == "Ok")
fisrt_part = D2[mask]
rest_data = D2[~mask]

Here, I use ~ to make False -> True, and converse.

  • Related