I have a CSV file which I opened using Pandas and I need to check if a field is missing a file. For example, I tried:
import pandas as pd
data = pd.read_csv(r'path')
df = pd.DataFrame(data)
df = df.drop_duplicates()
for row in df.iterrows():
if len(df['Country']) < 1:
print (row)
The input file looks like:
Country,"Avg(Mbit/s)Ookla"
Canada,75.18
South Korea,117.95
Netherlands,108.33
Japan,44.05
Norway,134.73
Singapore,67.99
Australia,76.52
Switzerland,82.29
Belgium,58.65
Croatia,86.48
New Zealand,49.49
Austria,56.6
Denmark,105.65
Lithuania,50.13
Czech Republic,44.55
United Arab Emirates,135.35
,41.32
So I'll have to check if Country is missing a Value and do something or if the Country is missing the AVG and do something else.
Here's like a complete placeholder code of the whole operation:
import pandas as pd
from api_data import APIData
api_data = APIData()
otherfile = api_data.get_content(api_data.get_token({
connection details ...
}))
otherfile = [row.split(',') for row in data]
otherdf = pd.DataFrame(data)
otherdf.drop(9, inplace=True, axis=1)
pd.set_option("display.max_rows", None, "display.max_columns", None)
data = pd.read_csv(r'path')
df = pd.DataFrame(data)
df = df.drop_duplicates()
for row in df.iterrows():
if (df[df["Country"].isna()]) == True:
df.drop(row)
elif (df[df["Avg(Mbit/s)Ookla"].isna()]) == True:
for otherrow in otherfile.itterrows():
if (df[df["Country"]]) == (otherdf[otherdf["Country"]]):
avgspeed = ( avgspeed (otherdf[otherdf["Country"]]) ) / df.count(how many elements was inserted)
(df[df["Avg(Mbit/s)Ookla"]]) = avgspeed
LE: First 5 rows of the DataFrame:
Country Avg(Mbit/s)Ookla
0 Canada 75.18
1 South Korea 117.95
2 Netherlands 108.33
3 Japan 44.05
4 Norway 134.73
Thank you very much!
CodePudding user response:
Using pandas index by condition in google I get this post on SO
Thus:
df=df[(~df['Country'].isna()) | (~df['Avg(Mbit/s)Ookla'] .isna())].copy()
Should work. Code is not tested since no example data is provided.
EDIT
Changed condition from np.nan
to pandas equivalent isna()
according to comment by Timus
CodePudding user response:
I found the solution, I merged the needed columns from external tables into my table and then:
inspdf['avgmbit'] = inspdf.groupby(['sub_region'])['avgmbit'].apply(lambda x: x.fillna(x.mean()))
inspdf
= my dataframe
avgmbit
= the column which was missing data
sub_region
= column from external table
Thank you!