I have a google spreadsheet which i managed to load into a pandas dataframe:
Tag1 Tag2 Tag3 Tag4 Tag5 MobileNo
Blue Yellow Green Velvet Red 12345678
Blue Yellow Pink Grey 234556778
Red Yellow Orange Velvet 4456568
Red Yellow Grey Blue 3454655467
Now i am not really familiar with pandas. I would need all MobileNo which have a tag in one of the 5 tag columns within their rows to be written into an array.
Like
tag_red_results = ['12345678', '4456568', '3454655467']
How can i accomplish this?
CodePudding user response:
IIUC, use pandas.DataFrame.loc
with boolean indexing :
# is the MobileNo tagged as "Red" ?
m = df.filter(like="Tag").eq("Red").any(axis=1)
s = df.loc[m, "MobileNo"]
If a list is needed, then use pandas.Series.to_list
:
tag_red_results = s.to_list()
#[12345678, 4456568, 3454655467]
Or, if you need a numpy array, use pandas.Series.to_numpy
:
tag_red_results = s.to_numpy()
#array([ 12345678, 4456568, 3454655467], dtype=int64)
CodePudding user response:
To get a list of all the MobileNo values where at least one of the Tag columns in the row has a value, you can use this code :
import pandas as pd
# Load the data into a pandas dataframe, specifying the column names
df = pd.read_csv('file.csv', names=['Tag1', 'Tag2', 'Tag3', 'Tag4', 'Tag5', 'MobileNo'])
# Create a list of all the MobileNo values where at least one of the Tag columns is not null
tag_red_results = df[df[['Tag1', 'Tag2', 'Tag3', 'Tag4', 'Tag5']].notnull().any(axis=1)]['MobileNo'].tolist()
print(tag_red_results)
CodePudding user response:
You can also use melt
to flatten your tag columns:
>>> df.melt('MobileNo').loc[lambda x: x['value'] == 'Red', 'MobileNo'].tolist()
[4456568, 3454655467, 12345678]