i have this dataframe:
index | x | y |
---|---|---|
0 | 0 | 3 |
1 | 0.07 | 4 |
2 | 0.1 | 6 |
3 | 0. 13 | 5 |
i want to insert new x values to the x column
new_x = [0, 0.03, 0.07, 0.1, 0.13, 0.17, 0.2]
so that the dataframe becomes
index | x | y |
---|---|---|
0 | 0 | 3 |
1 | 0.03 | NaN |
2 | 0.07 | 4 |
3 | 0.1 | 6 |
4 | 0. 13 | 5 |
5 | 0. 17 | NaN |
6 | 0. 2 | NaN |
so basically for every new_x
value that doesn't exist in column x
, the y value
is NaN
is it possible to do it in pandas? thank you
CodePudding user response:
You can use Numpy's searchsorted
.
After you create a new_y
array that is the same length as the new_x
array. You use searchsorted
to identify where in the new_y
array you need to drop the old y
values.
new_y = np.full(len(new_x), np.nan, np.float64)
new_y[np.searchsorted(new_x, df.x)] = df.y
pd.DataFrame({'x': new_x, 'y': new_y})
x y
0 0.00 3.0
1 0.03 NaN
2 0.07 4.0
3 0.10 6.0
4 0.13 5.0
5 0.17 NaN
6 0.20 NaN
CodePudding user response:
This is a straightforward application of the merge
function for pandas. More specifically a left join
.
import pandas as pd
x1 = [0, 0.07, 0.1, 0.13]
y1 = [3, 4, 6, 5]
df1 = pd.DataFrame({"x": x1, "y": y1})
print(df1)
x2 = [0, 0.03, 0.07, 0.1, 0.13, 0.17, 0.2]
df2 = pd.DataFrame({"x": x2})
print(df2)
df3 = df2.merge(df1, how="left", on="x")
print(df3)
x y
0 0.00 3
1 0.07 4
2 0.10 6
3 0.13 5
x
0 0.00
1 0.03
2 0.07
3 0.10
4 0.13
5 0.17
6 0.20
x y
0 0.00 3.0
1 0.03 NaN
2 0.07 4.0
3 0.10 6.0
4 0.13 5.0
5 0.17 NaN
6 0.20 NaN
CodePudding user response:
You could try to use the join method. Here is some sample code that you could refer
d1 = {'x': [0, 0.07, 0.1, 0.13], 'y': [3,4,6,5]}
d2 = {'x': [0, 0.03, 0.07, 0.1, 0.13, 0.17, 0.2]}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
df2.set_index('x').join(df1.set_index('x'), on='x', how='left').reset_index()
CodePudding user response:
try using this, you can compare the values already present in the column with the new one's and then you can create a df remaining values and concatenate it to the older df
import pandas as pd
import numpy as np
df = pd.DataFrame({'x':[0,0.07,0.1,0.13], 'y':[3,4,6,5]})
new_list = [0, 0.03, 0.07, 0.1, 0.13, 0.17, 0.2]
def diff(new_list, col_list):
if len(new_list) > len(df['x'].to_list()):
diff_list = list(set(new_list) - set(df['x'].to_list()))
else:
diff_list = list(set(df['x'].to_list()) - set(new_list))
return diff_list
new_df = pd.DataFrame({'x':diff(new_list,df['x'].to_list()),'y':np.nan})
fin_df = pd.concat([df,new_df]).reset_index(drop=True)
fin_df
x y
0 0.00 3.0
1 0.07 4.0
2 0.10 6.0
3 0.13 5.0
4 0.03 NaN
5 0.20 NaN
6 0.17 NaN