Home > front end >  How to avoid getting Nan while adding entries of two different csv columns in python
How to avoid getting Nan while adding entries of two different csv columns in python

Time:10-16

I am adding two different csv columns to make 2D hist plot. I have two different types of data which are dense and collision.

Further, each data includes information of my case study with column type where I have type=0 (big) and type=1(small). csv looks like this(from collision):

TIMESTEP id     type    a      |f|     |v|  
20000   4737     0     9.81  1.31495  4.18007   
40000  11991     1     9.81  4.43794  4.17909   
50000  15725     1     9.81  4.43794  4.17810     
30000   8209     0     9.81  4.43794  4.17810     
15000   3545     0     9.81  1.31495  4.17810   
30000   8269     0     9.81  4.43794  4.17810    
10000   2077     1     9.81  1.31495  4.17712   
20000   5079     0     9.81  1.31495  4.17712   

All data are float types with positive entries.

When I plot a and |f| from both types for type=0 (big) and type=1(small) separetely I don't have any problem. Also plot makes sense. However, plotting a and |f| from small big (i.e sum of each entry from each part) looks weird.

I realized small big gave me almost 90% Nan though original data doesn't have any Nan.

How can I avoid Nan while doing small big to make a perfect hist plot?

My plot looks like this with separete data and small big. enter image description here

I realized data is missing in collision:Small Big. I was expecting to have plot like Dense:Small Big.

My code is here:

from cProfile import label
from matplotlib.colors import LogNorm

df_collision_big = df_collision[df_collision['type'] == 0]
df_collision_small = df_collision[df_collision['type'] == 1]

df_dense_big =   df_dense[df_dense['type'] == 0]
df_dense_small = df_dense[df_dense['type'] == 1]


plt.subplots(figsize=(14, 6))
#make space between subplots
plt.subplots_adjust(wspace=0.5, hspace=0.6)
plt.subplot(231)
plt.hist2d(df_collision_small['a'], df_collision_small['|f|'], bins=np.linspace(0,70,15), norm=LogNorm())
plt.colorbar()
plt.xlabel('a')
plt.ylabel('|f|')
plt.title('Collision: Small')
plt.subplot(232)
plt.hist2d(df_collision_big['a'], df_collision_big['|f|'], bins=np.linspace(0,70,15), norm=LogNorm())
plt.colorbar()
plt.xlabel('a')
plt.ylabel('|f|')
plt.title('Collision: Big')
plt.subplot(233)
plt.hist2d(df_collision_big['a']   df_collision_small['a'], df_collision_big['|f|']   df_collision_small['|f|'], bins=np.linspace(0,70,15), norm=LogNorm())
plt.colorbar()
plt.xlabel('a')
plt.ylabel('|f|')
plt.title('Collision: Small   Big')

plt.subplot(234)
plt.hist2d(df_dense_small['a'], df_dense_small['|f|'], bins=np.linspace(0,70,15), norm=LogNorm())
plt.colorbar()
plt.xlabel('a')
plt.ylabel('|f|')
plt.title('Dense: Small')
plt.subplot(235)
plt.hist2d(df_dense_big['a'], df_dense_big['|f|'], bins=np.linspace(0,70,15), norm=LogNorm())
plt.colorbar()
plt.xlabel('a')
plt.ylabel('|f|')
plt.title('Dense: Big')
plt.subplot(236)
plt.hist2d(df_dense_big['a']   df_dense_small['a'], df_dense_big['|f|']   df_dense_small['|f|'], bins=np.linspace(0,70,15), norm=LogNorm())
plt.colorbar()
plt.xlabel('a')
plt.ylabel('|f|')
plt.title('Dense: Big   Small')
plt.savefig('hist2d.png', dpi=300)
plt.show()

Printing df_collision['a] gives me this:

175761    9.810009
409899    9.810058
429591    9.810058
358086    9.810009
89079     9.810009
            ...   
243866    9.810058
125778    9.810009
185374    9.810009
496586    9.810058
234942    9.810058
Name: a, Length: 27832, dtype: float64

Most of the values in a are similar.

printing df_collision_big['a'] df_collision_small['a'] gives me this:

0         19.620067
1         19.620067
2         19.620067
3         19.620067
4         19.620067
            ...    
504208          NaN
504209          NaN
504210          NaN
504211          NaN
504212          NaN
Name: a, Length: 18639, dtype: float64

One more thing:

printing len of small and big gives me this:

print(len(df_collision_small['a']))
print(len(df_collision_big['a']))

# Output
13772
14060

Hoping for some suggestions to solve this issue.

Thanks.

CodePudding user response:

You can solve the NaN issue in summation as:

pd.merge(df_collision_small["a"], df_dense_big["a"], how="outer", left_index=True, right_index=True).sum(axis=1)

Explanation

outer join on index results in a dataframe with "a" from both dataframes put together (index/position wise):

df_collision_small = pd.DataFrame({"a" : [1,2]})
>>    a
>> 0  1
>> 1  2

df_dense_big = pd.DataFrame({"a" : [10,20,30,40]})
>>     a
>> 0  10
>> 1  20
>> 2  30
>> 3  40

pd.merge(df_collision_small, df_dense_big, how="outer", left_index=True, right_index=True)
>>    a_x  a_y
>> 0  1.0   10
>> 1  2.0   20
>> 2  NaN   30
>> 3  NaN   40

sum(axis=1) ignores NaN while summing values:

pd.merge(df_collision_small["a"], df_dense_big["a"], how="outer", left_index=True, right_index=True).sum(axis=1)
>> 0    11.0
>> 1    22.0
>> 2    30.0
>> 3    40.0
  • Related