I am calling an API which returns a JSON-file. When I parse the JSON file to a pandas dataframe it returns the following table:
Level 1 | Level 2 | Level 3 |
---|---|---|
Total | Category1 | Category 1.2 |
Total | None | None |
Total | Category 2 | None |
Total | Category 2 | Category 2.1 |
Total | Category 3 | None |
Total | Category 3 | Category 3.1 |
As you can see there are three levels of categories. I would like to replace the "None" values in the columns with the value from the column before. The dataframe should be looking like this:
Level 1 | Level 2 | Level 3 |
---|---|---|
Total | Category1 | Category 1.2 |
Total | Total | Total |
Total | Category 2 | Category 2 |
Total | Category 2 | Category 2.1 |
Total | Category 3 | Category 3 |
Total | Category 3 | Category 3.1 |
I already experimented with some loops and with the following code:
for i in range(len(df.columns)):
df.iloc[:,i] = np.where(df.iloc[:,i] == "None",df.iloc[:,i 1],df.iloc[:,i])
But this does not work. How can I achieve this?
Thank you!
CodePudding user response:
Make sure that your None
are actually np.nan
, and then your task will be made simple as forward fill along the columns ffill(axis=1)
will do what you need:
df.replace('None',np.nan,inplace=True)
df_f = df.ffill(axis=1)
print(df_f)
Level 1 Level 2 Level 3
0 Total Category1 Category 1.2
1 Total Total Total
2 Total Category 2 Category 2
3 Total Category 2 Category 2.1
4 Total Category 3 Category 3
5 Total Category 3 Category 3.1
CodePudding user response:
DataFrame.ffill
works on the Python None
object. Pass axis=1
to fill across columns:
filled = df.ffill(axis=1)
print(filled)
Level 1 Level 2 Level 3
0 Total Category1 Category 1.2
1 Total Total Total
2 Total Category 2 Category 2
3 Total Category 2 Category 2.1
4 Total Category 3 Category 3
5 Total Category 3 Category 3.1