How do I return the column title of a cell, combine it with another value and store it in a new data-CodePudding

I have a dataframe with which if a cell has a value other than "." then I need python to return the cell's column title and the campus number. Here is an example of the dataframe

The end result should be a new dataframe or list that contains the column title and the campus number. It does not matter what the value inside the cell is, as long as it is not "."

I tried to use the following for loop statement: df is the original dataframe df2 is the new dataframe that is supposed to have column name and campus name

for i in df.iterrows():
 if df[i] == ".":
   i = i   1
 else:
   df2[i] = df[i].value   ""   df.col()
   i = i   1

CodePudding user response：

Reducing the dimensionality of the problem by stacking makes it easier as you can then simply query the index.

temp =  df.set_index('Campus').stack()
result_list = temp.loc[temp!='.'].index.values

CodePudding user response：

The simplest way I found was to run

df2 = df.where(df!='.')
df2 = df2.dropna()

The first line copies the whole df over, but replaces all the '.' with NaN, which can be dropped with dropna().

CodePudding user response：

Can you try to replace the unwanted values with NaN and then stacking?

df.replace('.', pd.NA).stack().index.to_list()

Example:

# input
   A  B  C
0  .  2  .
1  1  .  .
2  .  3  .

# output
[(0, 'B'), (1, 'A'), (2, 'B')]

To have columns first, use:

df.replace('.', pd.NA).T.stack().index.to_list()
# [('A', 1), ('B', 0), ('B', 2)]

Or, for top-down order:

df.replace('.', pd.NA).stack().swaplevel().index.to_list()
# [('B', 0), ('A', 1), ('B', 2)]