I'm working with the following table:
input_test | input_test2 | input_test3 | ip_test | ip_test2 | ip_test3 | |
---|---|---|---|---|---|---|
ENSG00000000003.15 | 1 | 1 | 1 | 3 | 3 | 3 |
ENSG00000000457.14 | 2 | 2 | 2 | 1 | 1 | 1 |
ENSG00000000460.17 | 2 | 2 | 2 | 3 | 3 | 3 |
ENSG00000001036.14 | 3 | 3 | 3 | 4 | 4 | 4 |
ENSG00000001167.14 | 3 | 3 | 3 | 5 | 5 | 5 |
My goal is to make a new column called translational efficiency that divides the averaged ip columns by the averaged input columns, which I'm thinking should look like this:
input_test | input_test2 | input_test3 | ip_test | ip_test2 | ip_test3 | translational_efficiency | |
---|---|---|---|---|---|---|---|
ENSG00000000003.15 | 1 | 1 | 1 | 3 | 3 | 3 | 3 |
ENSG00000000457.14 | 2 | 2 | 2 | 1 | 1 | 1 | 0.5 |
ENSG00000000460.17 | 2 | 2 | 2 | 3 | 3 | 3 | 1.5 |
ENSG00000001036.14 | 3 | 3 | 3 | 4 | 4 | 4 | 1.3 |
ENSG00000001167.14 | 3 | 3 | 3 | 5 | 5 | 5 | 1.6 |
Thus far, I've created a script with the following arguments:
python translational_efficiency.py --in_matrix all_reads_matrix.csv --ip_files ip_test ip_test2 ip_test3 --input_files input_test input_test2 input_test3 --save_path /Users/ks/Desktop/
Notice that the --ip_files
and --input_files
arguments take in multiple columns as a list that reflect the columns listed in the table.
I was hoping to do something like this:
import argparse
import pandas as pd
def trans_eff(in_matrix, ip_files, input_files, save_path):
# call in original table without translational efficiency column
df = pd.read_csv(in_matrix, index_col=False)
# divide ip files columns by input files column into new column called translational_efficiency
df['translational_efficiency'] = df[ip_files[0]] / df[input_files[0]]
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Make a new matrix that includes translational '
'efficiencies from merge_matrix.py')
parser.add_argument("--in_matrix",
help='matrix made by merge_matrix.py')
parser.add_argument("--ip_files",
help='all ip files in as a list',
action='append',
nargs='*')
parser.add_argument("--input_files",
help='all input files in as a list',
action='append',
nargs='*')
parser.add_argument("--save_path",
help='path to save')
# parse out arguments
args = parser.parse_args()
# create csv with translational efficiencies
trans_eff(args.in_matrix, args.ip_files, args.input_files, args.save_path)
However, this gives me the following error: ValueError: Wrong number of items passed 6, placement implies 1
CodePudding user response:
you can grab columns by prefix using startswith
string method
input_cols = [col for col in df.columns if col.startswith('input_test')]
ip_cols = [col for col in df.columns if col.startswith('ip_test')]
and calculate mean on axis=1
for those columns (for each row) and have a new column translational_efficiency
by dividing these mean.
df['translational_efficiency'] = df[ip_cols].mean(axis=1) / df[input_cols].mean(axis=1)
As you described in updated question and comment, i guess you can use df[ip_files] and df[input_files] directly as they have column names. You can ignore column name extraction part
df['translational_efficiency'] = df[ip_files].mean(axis=1) / df[input_files].mean(axis=1)