So I have a irregular dataframe with unnamed columns which looks something like this:
Unnamed:0 Unnamed:1 Unnamed:2 Unnamed:3 Unnamed:4
nan nan nan 2022-01-01 nan
nan nan nan nan nan
nan nan String Name Currency
nan nan nan nan nan
nan nan nan nan nan
nan nan String nan nan
nan nan xx A CAD
nan nan yy B USD
nan nan nan nan nan
Basically what I want to do is to find in which column and row the 'String' name is and start the dataframe from there, creating:
String Name Currency
String nan nan
xx A CAD
yy B USD
nan nan nan
My initial thought has been to use
locate_row = df.apply(lambda row: row.astype(str).str.contains('String').any(), axis=1)
combined with
locate_col = df.apply(lambda column: column.astype(str).str.contains('String').any(), axis=0)
This gives me series with the rows with the string and column with the string. My main problem is solving this without hardcoding using for eg. iloc[6:, 2:]
. Any help to get to the desired dataframe without hardcoding is of great help.
CodePudding user response:
In your example you can drop the columns that are entirely null, then drop rows with any null values. The result is the slice you are looking for. You can then promote the first row to headers.
df = df.dropna(axis=1,how='all').dropna().reset_index(drop=True)
df = df.rename(columns=df.iloc[0]).drop(df.index[0])
Output
String Name Currency
1 xx A CAD
2 yy B USD
CodePudding user response:
Hey you would need to iterate over the dataframe and then search for equality in the strings: in this article iteration is described How to iterate over rows in a DataFrame in Pandas
with this you can check for equality if s1 == s2: print('s1 and s2 are equal.')