I've converted a pdf table into a csv and need to clean up two index columns below.
I'd like to propagate the string ending with "LB" down Column A until a new one is listed.
Trying to create two new columns, from this...:
A B
30,000 LB NaN
foo high
bar low
25,000 LB NaN
baz high
zoo low
...to this.:
A B
30,000 LB high
30,000 LB low
25,000 LB high
25,000 LB low
CodePudding user response:
You can check for column A
ends with LB
by str.endswith()
.
Then use .where()
to mask other row entries not ends with LB
to NaN
.
Finally, forward fill the NaN
values with .ffill()
for column A
To further clean up, we remove the rows where column B
have NaN
values by .dropna()
.
df['A'] = df['A'].where(df['A'].str.endswith('LB')).ffill()
df = df.dropna(subset=['B'])
Result:
print(df)
A B
1 30,000 LB high
2 30,000 LB low
4 25,000 LB high
5 25,000 LB low
CodePudding user response:
It looks like something went wrong with converting PDF?
Anyhow, I would first put NaN in column A, where is not NaN in column B. Then use fillna from pandas to propagate down the values in column A. Finally, dropna since in column B there are still missing values.