Let's say I have a dataframe in python with a range of animals, and a range of attributes, with dummy variables for whether the animal has that attribute. I'm interested in creating lists, both vertically and horizontally based on dummy variable value. e.g. I'd like to:
a) create a list of animals that have hair
b) create a list of all the attributes that a dog has.
Could anyone please assist with how I would do this in Python? Thanks very much!
Name | Hair | Eyes |
---|---|---|
Dog | 1 | 1 |
Fish | 0 | 1 |
CodePudding user response:
You could use a dictionary to store values regarding the animals. And the first value of the values list can hold the 0 or 1 denoting hair on the animal.
animals = { "Dog": [ 1, 1 ], "Fish": [ 0, 1 ] }
CodePudding user response:
(a)
df[ df['Hair'] == 1 ]['Name'].to_list()
df.loc[ df['Hair'] == 1, 'Name'].to_list()
(b)
It may need to transpose dataframe (to convert rows into columns) and set column's names.
And later you can use similar code
df[ df['Dog'] == 1 ].index.to_list()
Minimal working code
text = '''Name,Hair,Eyes
Dog,1,1
Fish,0,1'''
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text))
print(df)
print('---')
print('Hair 1:', df[ df['Hair'] == 1 ]['Name'].to_list())
print('hair 2:', df.loc[ df['Hair'] == 1, 'Name'].to_list())
print('---')
# transpose
#new_df = df.transpose() #
new_df = df.T # shorter name - without `()`
# convert first row into column's names
new_df.columns = new_df.loc['Name']
new_df = new_df[1:]
print(new_df)
print('---')
print('Dog :', new_df[ new_df['Dog'] == 1 ].index.to_list())
print('Fish:', new_df[ new_df['Fish'] == 1 ].index.to_list())
Result:
Name Hair Eyes
0 Dog 1 1
1 Fish 0 1
---
Hair 1: ['Dog']
hair 2: ['Dog']
---
Name Dog Fish
Hair 1 0
Eyes 1 1
---
Dog : ['Hair', 'Eyes']
Fish: ['Eyes']