Home > other >  Using Python and Pandas for a filtering code
Using Python and Pandas for a filtering code

Time:10-25

You’ll need to bring all your filtering skills together for this task. We’ve provided you a list of companies in the developers variable. Filter df however you choose so that you only get games that meet the following conditions: Sold in all 3 regions (North America, Europe, and Japan) The Japanese sales were greater than the combined sales from North America and Europe The game developer is one of the companies in the developers list There is no column that explicitly says whether a game was sold in each region, but you can infer that a game was not sold in a region if its sales are 0 for that region. Use the cols variable to select only the 'name', 'developer', 'na_sales', 'eu_sales', and 'jp_sales' columns from the filtered DataFrame, and assign the result to a variable called df_filtered. Print the whole DataFrame.

You can use a filter mask or query string for this task. In either case, you need to check if the 'jp_sales' column is greater than the sum of 'na_sales' and 'eu_sales', check if each sales column is greater than 0, and use isin() to check if the 'developer' column contains one of the values in developers. Use [cols] to select only those columns and then print df_filtered.

developer na_sales eu_sales jp_sales critic_score user_score
0 Nintendo 41.36 28.96 3.77 76.0 8.0
1 NaN 29.08 3.58 6.81 NaN NaN
2 Nintendo 15.68 12.76 3.79 82.0 8.3
3 Nintendo 15.61 10.93 3.28 80.0 8.0
4 NaN 11.27 8.89 10.22 NaN NaN

This is my code. Pretty difficult and having difficulty providing a df_filtered variable with a running code.

import pandas as pd

df = pd.read_csv('/datasets/vg_sales.csv')
df['user_score'] = pd.to_numeric(df['user_score'], errors='coerce')

developers = ['SquareSoft', 'Enix Corporation', 'Square Enix']
cols = ['name', 'developer', 'na_sales', 'eu_sales', 'jp_sales']

df_filtered = df([cols ]> 0 | cols['jp_sales'] > sum(cols['eu_sales'] cols['na_sales']) |   df['developer'].isin(developers))
print(df_filtered)

CodePudding user response:

If I understand correctly, it looks like a multi-condition dataframe filtering:

df[
    df["developer"].isin(developers) \
   & (df["jp_sales"] > df["na_sales"]   df["eu_sales"]) \
   & ~df["na_sales"].isnull()
   & ~df["eu_sales"].isnull()
   & ~df["jp_sales"].isnull()
]

It will not return results for sample dataset given in question because the conditions that JP sales should exceed NA and EU sales and developer should be from given list are not met. But it works for proper data:

data=[
("SquareSoft",41.36,28.96,93.77,76.0,8.0),
(np.nan,29.08,3.58,6.81,np.nan,np.nan),
("SquareSoft",15.68,12.76,3.79,82.0,8.3),
("Nintendo",15.61,10.93,30.28,80.0,8.0),
(np.nan,11.27,8.89,10.22,np.nan,np.nan)
]

columns = ["developer","na_sales","eu_sales","jp_sales","critic_score","user_score"]

developers = ['SquareSoft', 'Enix Corporation', 'Square Enix']

df = pd.DataFrame(data=data, columns=columns)

[Out]:
    developer  na_sales  eu_sales  jp_sales  critic_score  user_score
0  SquareSoft     41.36     28.96     93.77          76.0         8.0


CodePudding user response:

Try this:

developers = ['SquareSoft', 'Enix Corporation', 'Square Enix']
cols = ['name', 'developer', 'na_sales', 'eu_sales', 'jp_sales']

cond = (
    # Sold in all 3 regions
    df[["na_sales", "eu_sales", "jp_sales"]].gt(0).all(axis=1)
    # JP sales greater than NA and EU sales combined
    & df["jp_sales"].gt(df["na_sales"]   df["eu_sales"])
    # Developer is in a predefined list
    & df["developer"].isin(developers)
)

if cond.any():
    df_filtered = df.loc[cond, cols]
else:
    print("No match found")
  • Related