How to create a subset of data with Panda?-CodePudding

My task is to select a subset of data to a given region based on a csv files. They gave me a hint to use the module panda but I don't know what function to use to do my task.

Here's my code (The task is to give the data frames from any kind of region displayed.)

def CreateSubsetPerRegion(df, region):
    #TODO Extraire les sous-données par région à l'aide du dataframe pandas ('subset' de données)
    path = os.getcwd()

    df = pd.read_csv(os.path.join(path, '2020.csv'))
    pd.set_option("display.precision", 2)
    #display(df.head(30))
    #region = {df['Region'][152]}
    
    return df, region

display(CreateSubsetPerRegion(df,'East Asia'))
display(CreateSubsetPerRegion(df,'Central and Eastern Europe'))

Here's an example of what it should look like: The given region is south Asia

Here's the Csv files: https://github.com/INF1007-2022A/L03-TP-4-l03-7/blob/master/2020.csv

I tried a lot of functions that i found in the internet but they didn't work.

CodePudding user response：

First, you probably just want to load your DataFrame once. Then, you can get your per-region DataFrame using a simple mask like this:

path = os.getcwd()
df = pd.read_csv(os.path.join(path, '2020.csv'))

east_asia = df[df["Region"] == "East Asia"]
ce_europe = df[df["Region"] == "Central and Eastern Europe"]

These will give you all the rows of your DataFrame where the column "Region" has the desired value.

CodePudding user response：

I cannot see your csv, but you can provide a snippet as text in your question.

But it should be in the lines of this:

return df[df['Region'] == region]

But you should check if this returns an emtpy DataFrame (then the region input is not in your DataFrame) since for this method the 'region' string should exactly match with what is in your csv file.