Code to drop rows based on a partial string is not working.
Very simple code, and it runs fine but doesn't drop the rows I want.
The original table in the pdf looks like this:
Chemical | Value | Unit | Type |
---|---|---|---|
Fluoride | 0.23 | ug/L | Lab |
Mercury | 0.15 | ug/L | Lab |
Sum of Long Chained Polymers | 0.33 | ||
Partialsum of Short Chained Polymers | 0.40 |
What I did:
import csv
import tabula
dfs = tabula.read _pdf("Test.pdf", pages= 'all')
file = "Test.pdf"
tables = tabula.read_pdf(file, pages=2, stream=True, multiple_tables=True)
table1 = tables[1]
table1.drop('Unit', axis=1, inplace=True)
table1.drop('Type', axis=1, inplace=True)
discard = ['sum','Sum']
table1[~table1.Chemical.str.contains('|'.join(discard))]
print(table1)
table1.to_csv('test.csv')
The results are that it drops the 2 columns I don't want, so that's fine. But it did not delete the rows with the words "sum" or "Sum" in them. Any insights?
CodePudding user response:
You are close. You did drop the rows, but you didn't save the result.
import pandas as pd
example = {'Chemical': ['Fluoride', 'Mercury', 'Sum of Long Chained Polymers',
'Partialsum of Short Chained Polymers'],
'Value': [0.23, 0.15, 0.33, 0.4],
'Unit': ['ug/L', 'ug/L', '', ''],
'Type': ['Lab', 'Lab', '', '']}
table1 = pd.DataFrame(example)
table1.drop('Unit', axis=1, inplace=True)
table1.drop('Type', axis=1, inplace=True)
discard = ['sum','Sum']
table1 = table1[~table1.Chemical.str.contains('|'.join(discard))]
print(table1)
CodePudding user response:
You can use pd.Series.str.contains
with the argument case=False
to ignore case:
Also, it's not law, but often considered poor practice to use inplace=True
... because in part it leads to confusions like the one you're experiencing.
Given df
:
Chemical Value Unit Type
0 Fluoride 0.23 ug/L Lab
1 Mercury 0.15 ug/L Lab
2 Sum of Long Chained Polymers 0.33 NaN NaN
3 Partialsum of Short Chained Polymers 0.40 NaN NaN
Doing:
df = (df.drop(['Unit', 'Type'], axis=1)
.loc[~df.Chemical.str.contains('sum', case=False)])
Output:
Chemical Value
0 Fluoride 0.23
1 Mercury 0.15