Home > Software design >  Parsing data using pandas
Parsing data using pandas

Time:07-29

I have a CSV file which I opened using Pandas and I need to check if a field is missing a file. For example, I tried:

import pandas as pd

data = pd.read_csv(r'path')
df = pd.DataFrame(data)
df = df.drop_duplicates()
for row in df.iterrows():
    if len(df['Country']) < 1:
        print (row)

The input file looks like:

Country,"Avg(Mbit/s)Ookla"
Canada,75.18
South Korea,117.95
Netherlands,108.33
Japan,44.05
Norway,134.73
Singapore,67.99
Australia,76.52
Switzerland,82.29
Belgium,58.65
Croatia,86.48
New Zealand,49.49
Austria,56.6
Denmark,105.65
Lithuania,50.13
Czech Republic,44.55
United Arab Emirates,135.35
,41.32

So I'll have to check if Country is missing a Value and do something or if the Country is missing the AVG and do something else.

Here's like a complete placeholder code of the whole operation:

import pandas as pd
from api_data import APIData
api_data = APIData()

otherfile = api_data.get_content(api_data.get_token({
connection details ...
}))

otherfile = [row.split(',') for row in data]

otherdf = pd.DataFrame(data)
otherdf.drop(9, inplace=True, axis=1)
pd.set_option("display.max_rows", None, "display.max_columns", None)


data = pd.read_csv(r'path')
df = pd.DataFrame(data)
df = df.drop_duplicates()
for row in df.iterrows():
    if (df[df["Country"].isna()]) == True:
        df.drop(row)
    elif (df[df["Avg(Mbit/s)Ookla"].isna()]) == True:
        for otherrow in otherfile.itterrows():
            if (df[df["Country"]]) == (otherdf[otherdf["Country"]]):
                avgspeed = ( avgspeed   (otherdf[otherdf["Country"]]) ) / df.count(how many elements was inserted)
                (df[df["Avg(Mbit/s)Ookla"]]) = avgspeed

LE: First 5 rows of the DataFrame:

  Country  Avg(Mbit/s)Ookla
0         Canada             75.18
1    South Korea            117.95
2    Netherlands            108.33
3          Japan             44.05
4         Norway            134.73

Thank you very much!

CodePudding user response:

Using pandas index by condition in google I get this post on SO

Thus:

df=df[(~df['Country'].isna()) | (~df['Avg(Mbit/s)Ookla'] .isna())].copy()

Should work. Code is not tested since no example data is provided.

EDIT Changed condition from np.nan to pandas equivalent isna() according to comment by Timus

CodePudding user response:

I found the solution, I merged the needed columns from external tables into my table and then: inspdf['avgmbit'] = inspdf.groupby(['sub_region'])['avgmbit'].apply(lambda x: x.fillna(x.mean()))

inspdf = my dataframe

avgmbit = the column which was missing data

sub_region = column from external table

Thank you!

  • Related