Home > Software engineering >  Python search for string in whole pandas dataframe and make that first column
Python search for string in whole pandas dataframe and make that first column

Time:10-20

So I have a irregular dataframe with unnamed columns which looks something like this:

Unnamed:0 Unnamed:1 Unnamed:2 Unnamed:3  Unnamed:4
nan       nan        nan      2022-01-01  nan
nan       nan        nan      nan         nan
nan       nan        String   Name        Currency
nan       nan        nan      nan         nan
nan       nan        nan      nan         nan
nan       nan        String   nan         nan
nan       nan        xx       A           CAD
nan       nan        yy       B           USD 
nan       nan        nan      nan         nan

Basically what I want to do is to find in which column and row the 'String' name is and start the dataframe from there, creating:

String Name Currency
String nan  nan
xx     A    CAD
yy     B    USD
nan    nan  nan

My initial thought has been to use locate_row = df.apply(lambda row: row.astype(str).str.contains('String').any(), axis=1) combined with locate_col = df.apply(lambda column: column.astype(str).str.contains('String').any(), axis=0) This gives me series with the rows with the string and column with the string. My main problem is solving this without hardcoding using for eg. iloc[6:, 2:]. Any help to get to the desired dataframe without hardcoding is of great help.

CodePudding user response:

In your example you can drop the columns that are entirely null, then drop rows with any null values. The result is the slice you are looking for. You can then promote the first row to headers.

df = df.dropna(axis=1,how='all').dropna().reset_index(drop=True)
df = df.rename(columns=df.iloc[0]).drop(df.index[0])

Output

  String Name Currency
1     xx    A      CAD
2     yy    B      USD

CodePudding user response:

Hey you would need to iterate over the dataframe and then search for equality in the strings: in this article iteration is described How to iterate over rows in a DataFrame in Pandas

with this you can check for equality if s1 == s2: print('s1 and s2 are equal.')

  • Related