Home > Enterprise >  Dividing one dataframe by another in python using pandas with float values
Dividing one dataframe by another in python using pandas with float values

Time:11-17

I have two separate data frames named df1 and df2 as shown below:

    Scaffold  Position  Ref_Allele_Count  Alt_Allele_Count  Coverage_Depth  Alt_Allele_Frequency
0          1        11                 7                51              58              0.879310
1          1        16                20                95             115              0.826087
2          2         9                 9                33              42              0.785714
3          2        12                86                51             137              0.372263
4          2        67                41                98             139              0.705036
5          3         8                 0                 0               0              0.000000
6          4        99                32                26              58              0.448276
7          4       101               100                24             124              0.193548
8          4       115                69                26              95              0.273684
9          5         6                40                57              97              0.587629
10         5        19                53                87             140              0.621429
    Scaffold  Position  Ref_Allele_Count  Alt_Allele_Count  Coverage_Depth  Alt_Allele_Frequency
0          1        11                 7                64              71              0.901408
1          1        16                10                90             100              0.900000
2          2         9                79                86             165              0.521212
3          2        12                12                73              85              0.858824
4          2        67                54                96             150              0.640000
5          3         8                 0                 0               0              0.000000
6          4        99                86                28             114              0.245614
7          4       101                32                25              57              0.438596
8          4       115                97                16             113              0.141593
9          5         6                86                43             129              0.333333
10         5        19                59                27              86              0.313953

I have already found the sum values for df1 and df2 in Allele_Count and Coverage Depth but I need to divide the resulting Alt_Allele_Count and Coverage_Depth of both df's with one another to fine the total allele frequency(AF). I have tried dividing the two variable and got the error message : TypeError: float() argument must be a string or a number, not 'DataFrame' when I tried to convert them to floats and this table when I laft it as a df:

    Alt_Allele_Count  Coverage_Depth
0                NaN             NaN
1                NaN             NaN
2                NaN             NaN
3                NaN             NaN
4                NaN             NaN
5                NaN             NaN
6                NaN             NaN
7                NaN             NaN
8                NaN             NaN
9                NaN             NaN
10               NaN             NaN

My code so far:

import csv
import pandas as pd
import numpy as np

df1 = pd.read_csv('C:/Users/Tom/Python_CW/file_pairA_1.csv')
df2 = pd.read_csv('C:/Users/Tom/Python_CW/file_pairA_2.csv')
print(df1)
print(df2)


Ref_Allele_Count = (df1[['Ref_Allele_Count']]   df2[['Ref_Allele_Count']])
print(Ref_Allele_Count)

Alt_Allele_Count = (df1[['Alt_Allele_Count']]   df2[['Alt_Allele_Count']])
print(Alt_Allele_Count)

Coverage_Depth = (df1[['Coverage_Depth']]   df2[['Coverage_Depth']]).astype(float)
print(Coverage_Depth)

AF = Alt_Allele_Count / Coverage_Depth

print(AF)

CodePudding user response:

The error stems from the difference between a pandas series and a dataframe. Series are 1 dimensional structures like a singular column, while dataframes are 2d objects like tables. Series added together make a new series of values while dataframes added together make something a lot less usable.

Taking slices of a dataframe can either result in a series or dataframe object depending on how you do it:

df['column_name'] -> Series
df[['column_name', 'column_2']] -> Dataframe

So in the line:

Ref_Allele_Count = (df1[['Ref_Allele_Count']]   df2[['Ref_Allele_Count']])

df1[['Ref_Allele_Count']] becomes a singular column dataframe rather than a series.

Ref_Allele_Count = (df1['Ref_Allele_Count']   df2['Ref_Allele_Count'])

Should return the correct result here. Same goes for the rest of the columns you're adding together.

CodePudding user response:

This can be fixed by only using once set of brackets '[]' while referring to a column in a pandas df, rather than 2.

  • Related