Write Values found by Pandas Dataframe's .loc function into an array-CodePudding

I have a google spreadsheet which i managed to load into a pandas dataframe:

Tag1    Tag2    Tag3    Tag4    Tag5    MobileNo
Blue    Yellow  Green   Velvet  Red     12345678
Blue    Yellow  Pink    Grey            234556778
Red     Yellow  Orange  Velvet          4456568
Red     Yellow  Grey    Blue            3454655467

Now i am not really familiar with pandas. I would need all MobileNo which have a tag in one of the 5 tag columns within their rows to be written into an array.

tag_red_results = ['12345678', '4456568', '3454655467']

How can i accomplish this?

CodePudding user response：

IIUC, use pandas.DataFrame.loc with boolean indexing :

# is the MobileNo tagged as "Red" ?
m = df.filter(like="Tag").eq("Red").any(axis=1)

s = df.loc[m, "MobileNo"]

If a list is needed, then use pandas.Series.to_list :

tag_red_results = s.to_list()
#[12345678, 4456568, 3454655467]

Or, if you need a numpy array, use pandas.Series.to_numpy :

tag_red_results = s.to_numpy()
#array([  12345678,    4456568, 3454655467], dtype=int64)

CodePudding user response：

To get a list of all the MobileNo values where at least one of the Tag columns in the row has a value, you can use this code :

    import pandas as pd

    # Load the data into a pandas dataframe, specifying the column names
    df = pd.read_csv('file.csv', names=['Tag1', 'Tag2', 'Tag3', 'Tag4', 'Tag5', 'MobileNo'])

    # Create a list of all the MobileNo values where at least one of the Tag columns is not null
    tag_red_results = df[df[['Tag1', 'Tag2', 'Tag3', 'Tag4', 'Tag5']].notnull().any(axis=1)]['MobileNo'].tolist()

    print(tag_red_results)

CodePudding user response：

You can also use melt to flatten your tag columns:

>>> df.melt('MobileNo').loc[lambda x: x['value'] == 'Red', 'MobileNo'].tolist()
[4456568, 3454655467, 12345678]