How to rearrange my dataframe according to column names while searching for specific strings in cells?
My dataframe:
0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|
apple pie | banana bread | orange juice | nan | nan |
apple cookies | orange lemonade | nan | nan | nan |
banana muffin | orange ice | berry candy | nan | nan |
berry juice | nan | nan | nan | nan |
I want to arrange the rows according to a list of column names, which look for specific strings of text.
apple | banana | orange | berry | lemon |
---|---|---|---|---|
apple pie | banana bread | orange juice | nan | nan |
apple cookies | nan | orange lemonade | nan | nan |
nan | banana muffin | orange ice | berry candy | nan |
nan | nan | nan | berry juice | nan |
I have tried to create a column/list for each fruit, searching for the right string and adding the cell if it matches, however I do not know how to iterate through the dataframe and assign values. I just get a column of Nan's.
col_names = ['apple', 'banana', 'orange', 'berry', 'lemonade']
apples = np.where(df_fruits.str.contains("apple", case=False, na=False), df_fruits, np.nan)
bananas = np.where(df_fruits.str.contains("banana", case=False, na=False), df_fruits, np.nan)
etc...
Edit: I got the dataframe from a csv-file, so the original data format is in rows of string: "apple pie, banana bread, orange juice, nan, nan" etc.
CodePudding user response:
we can do some re-shaping using .unstack
and .str.extractall
pat = '|'.join(col_names)
s = df.stack()
s1 = s.to_frame('vals').join(
s.str.extractall(f'({pat})').groupby(level=[0,1]).agg(list))
out = s1.explode(0).set_index(0,append=True).reset_index(1,drop=True).unstack(-1)
print(out)
vals
0 apple banana berry lemonade orange
0 apple pie banana bread NaN NaN orange juice
1 apple cookies NaN NaN orange lemonade orange lemonade
2 NaN banana muffin berry candy NaN orange ice
3 NaN NaN berry juice NaN NaN
# if you want to drop the level on the multi index.
out.columns = out.columns.droplevel(None)
0 apple banana berry lemonade orange
0 apple pie banana bread NaN NaN orange juice
1 apple cookies NaN NaN orange lemonade orange lemonade
2 NaN banana muffin berry candy NaN orange ice
3 NaN NaN berry juice NaN NaN
CodePudding user response:
Try this:
list_values=[item for value in df_fruits.values for item in value]
list_series=[]
for col in col_names:
locals()[col "series"]=pd.Series(map(lambda x:x*(col in str(x)),list_values)
list_series.append(eval(col "series"))
the first row is the get all your dataframe colums values into a list next we create a pandas series for every fruit type and append it into a list after we create a new data frame
new_df=pd.concat(list_series,axis=1)