I'm new in the python world, I have been always taking advantage of the vectorized operations of R, so I have a basic question...
I have 2 arrays, 1 with int
values and the other with string
ones. I would like to have a pandas series with the concatenation of both like:
0 Enterobact
1 Pseudomo
2 Mycobact
3 Bac
4 Streptoc
5 Propionibact
6 Staphyloc
7 Morax
8 Synechoc
9 Gord
Name: fam, dtype: object
0 7275
1 3872
2 3869
3 1521
4 1408
5 1022
6 877
7 765
8 588
9 578
Name: frequency, dtype: int64
And I would like to have the following..:
Enterobact - 7275
Pseudomo - 3872
Mycobact - 3869
# And so on...
Which should be the proper way to solve this problem in python? Not the way adapted for R users. Thank you very much in advance...
CodePudding user response:
Not sure in what format you actually need the result but I will give you two methods. First of all, I assume that your data is stored in two variables:
print(fam_column)
print(freq_column)
Output of the two vars is exactly what you have:
0 Enterobact
1 Pseudomo
2 Mycobact
3 Bac
4 Streptoc
5 Propionibact
6 Staphyloc
7 Morax
8 Synechoc
9 Gord
Name: fam, dtype: object
0 7275
1 3872
2 3869
3 1521
4 1408
5 1022
6 877
7 765
8 588
9 578
Name: frequency, dtype: int64
So, the first method makes use of the fact that these lists are dataframe columns and we can use operations from pandas
. The code simply concatenates the rows together as string and in the middle is -
:
result = fam_column ' - ' freq_column.astype(str)
print(result)
Output:
0 Enterobact - 7275
1 Pseudomo - 3872
2 Mycobact - 3869
3 Bac - 1521
4 Streptoc - 1408
5 Propionibact - 1022
6 Staphyloc - 877
7 Morax - 765
8 Synechoc - 588
9 Gord - 578
dtype: object
In your question, you mentioned that you want to combine two arrays (in python lists), therefore I created a second method. This one is not preferred as using the existing dataframes is much simpler. This method converts your columns into two lists and then combines them in a generator to the desired form.
list_fam = list(df1['fam'])
list_frequency = list(df2['frequency'])
result = [x ' - ' str(y) for x, y in zip(list_fam,list_frequency)]
print(result)
The output is the following:
['Enterobact - 7275', 'Pseudomo - 3872', 'Mycobact - 3869', 'Bac - 1521', 'Streptoc - 1408', 'Propionibact - 1022', 'Staphyloc - 877', 'Morax - 765', 'Synechoc - 588', 'Gord - 578']