I am hoping this question makes sense. I have a table I extracted from a PDF of chemical names that I am trying to format and I am having issues it looks like this: table
Some of the chemical names are split into multiple rows and I need each name in its won row. I did notice the chemicals whose names are split into multiple rows have an NaN in the first column.
EDIT: after running dt.head(15).to_dict()
{'Unnamed: 0': {6: '1', 7: nan, 8: '2', 9: '3', 10: nan, 11: nan, 12: '4', 13: '5', 14: nan, 15: nan, 16: '6', 17: '7', 18: '8', 19: '9', 20: nan}, 'Phenolics': {6: 'Dihydroquercetin', 7: '7,30-dimethyl ether', 8: 'Artelin', 9: 'Esculin 7-', 10: 'methylether', 11: '(methylesculin)', 12: 'Esculin', 13: 'Scopoletin (7-', 14: 'hydroxy-6-', 15: 'methoxycoumarin)', 16: 'Axillarin', 17: 'Esculetin', 18: 'Isoscopoletin', 19: '6-Beta-D-glucosyl-7-', 20: 'methoxycoumarin'}}
Can anyone help me? Thank you!
CodePudding user response:
A one-line solution
df = df.fillna(method='ffill').groupby('Unnamed: 0')['Phenolics'].apply(' '.join).reset_index()
CodePudding user response:
df["group"] = df["Unnamed: 0"].ffill()
df.groupby("group").agg({"Phenolics": "".join})