Pandas: How to do operations using loc with multiple columns and format the results-CodePudding

Here is the csv file I using.

Goals:

I want to extract the rows that has values of columns "generation_id" or "is_main_series" greater than 0 and assign them a variable "selection" (Just to be clear, I want rows that has at least one of two columns greater than 0.)
Then, I extract the "identifier" columns from the "selection" and assign them a variable "name".
Finally, I would like to format the rows of "selection" for each "name" in this way --> name(generation_id, is_main_series), name1(is_main_series), etc

Error: line 7: ValueError: Cannot index with multidimensional key

I'm struggling to find solution for this error because I don't think this Dataframe is multiindexed.

Here is the code I wrote so far:

import pandas as pd

df = pd.read_csv('abilities.csv')
df = df.fillna(0)

def getPokedex():
    selection = df.loc[df[['generation_id', 'is_main_series']] != 0 ]
    for donor in selection:
        name = selection['identifier']
        name = name.to_string(index=False)
        for types in selection.columns:
            return f"{name}(types)"

print(getPokedex())

Would appreciate any help.

CodePudding user response：

I hope I can help you. I think I can resolve the first at least. You are comparing a list with a number. It doesn't look fine. Did you try something like this?:

// get the boolean values
booleanValues = (df.iloc[:,2] != 0) | (df.iloc[:,3] != 0)
// then get the df filtered
df_filtered = df[booleanValues]

I think the error raised is that. Then I have to think about the remainder of the code but I wanted to give you this quick help.

CodePudding user response：

As I understand your question, you want to print rows in a particular format, for which "generation_id" or "is_main_series" greater than 0. You can try the below code. See comments inline

import pandas as pd    
url="https://raw.githubusercontent.com/veekun/pokedex/master/pokedex/data/csv/abilities.csv"
df=pd.read_csv(url)

#Filter dataframe for generation_id or is_main_series greater than 0
df = df[(df['generation_id']>0) | (df['is_main_series']>0)]

#Format and print the rows.
for index, row in df.iterrows():
    print(row['identifier'],'(', row['generation_id'], ',' , row['is_main_series'] ,')')

Output:

stench ( 3 , 1 )
drizzle ( 3 , 1 )
speed-boost ( 3 , 1 )
battle-armor ( 3 , 1 )
...
shackle ( 5 , 0 )
decoy ( 5 , 0 )
shield ( 5 , 0 )

If you want the output of all columns in one line with comma separated, you can do this. Here the idea is to create a new column which you need to print it later. And then convert the column to list and then join it as a comma separated string.

df['Formatted_column'] = df['identifier']   '('   df['generation_id'].astype(str)   ','   df['is_main_series'].astype(str)  ')'
print(', '.join(df['Formatted_column'].to_list()))

Output of this

stench(3,1), drizzle(3,1), speed-boost(3,1), battle-armor(3,1),  .... decoy(5,0), shield(5,0)