I'm trying to use two columns from an existing dataframe to generate a list of new strings with those values. I found a lot of examples doing something similar, but not the same thing, so I appreciate advice or links elsewhere if this is a repeat question. Thanks in advance!
If I start with a data frame like this:
import pandas as pd
df=pd.DataFrame(data=[["a",1],["b",2],["c",3]], columns=["id1","id2"])
id1 id2
0 a 1
1 b 2
2 c 3
I want to make a list that looks like new_ids=['a_1','b_2','c_3'] where values are from combining values in row 0 for id1 with values for row 0 for id2 and so on.
I started by making lists from the columns, but can't figure out how to combine them into a new list. I also tried not using intermediate lists, but couldn't get that either. Error messages below are accurate to the mock data, but are different from the ones with real data.
#making separate lists version
#this function works
def get_ids(orig_df):
id1_list=[]
id2_list=[]
for i in range(len(orig_df)):
id1_list.append(orig_df['id1'].values[i])
id2_list.append(orig_df['id2'].values[i])
return(id1_list,id2_list)
idlist1,idlist2=get_ids(df)
#this is the part that doesn't work
new_id=[]
for i,j in zip(idlist1,idlist2):
row='_'.join(str(idlist1[i]),str(idlist2[j]))
new_id.append(row)
#------------------------------------------------------------------------
#AttributeError Traceback (most recent call #last)
#<ipython-input-44-09983bd890a6> in <module>
# 1 newid_list=[]
# 2 for i in range(len(df)):
#----> 3 n1=df['id1'[i].values]
# 4 n2=df['id2'[i].values]
# 5 nid= str(n1) "_" str(n2)
#AttributeError: 'str' object has no attribute 'values'
#skipping making lists (also doesn't work)
newid_list=[]
for i in range(len(df)):
n1=df['id1'[i].values]
n2=df['id2'[i].values]
nid= str(n1) "_" str(n2)
newid_list.append(nid)
#---------------------------------------------------------------------------
#TypeError Traceback (most recent call last)
#<ipython-input-41-6b0c949a1ad5> in <module>
# 1 new_id=[]
# 2 for i,j in zip(idlist1,idlist2):
#----> 3 row='_'.join(str(idlist1[i]),str(idlist2[j]))
# 4 new_id.append(row)
# 5 #return ', '.join(new_id)
#TypeError: list indices must be integers or slices, not str
CodePudding user response:
(df.id1 "_" df.id2.astype(str)).tolist()
output:
['a_1', 'b_2', 'c_3']
your approaches(corrected):
def get_ids(orig_df):
id1_list=[]
id2_list=[]
for i in range(len(orig_df)):
id1_list.append(orig_df['id1'].values[i])
id2_list.append(orig_df['id2'].values[i])
return(id1_list,id2_list)
idlist1, idlist2=get_ids(df)
#this is the part that doesn't work
new_id=[]
for i,j in zip(idlist1,idlist2):
row='_'.join([str(i),str(j)])
new_id.append(row)
newid_list=[]
for i in range(len(df)):
n1=df['id1'][i]
n2=df['id2'][i]
nid= str(n1) "_" str(n2)
newid_list.append(nid)
points:
- in first approach, when you loop on data,
i
andj
are data, not indices, so use them as data and convert them to string. - join get
list
as data and simply define alist
using 2 data:[str(i),str(j)]
and pass tojoin
- in second approach, you can get every element of every column using
df['id1'][i]
and you don't needvalues
that return all elements of column as a numpy array
if you want to use values
:
(df.id1.values "_" df.id2.values.astype(str)).tolist()
CodePudding user response:
Try this it works on me:
import pandas as pd
df=pd.DataFrame(data=[["a",1],["b",2],["c",3]], columns=["id1","id2"])
index=0
newid_list=[]
while index < len(df):
newid_list.append(str(df['id1'][index]) '_' str(df['id2'][index]))
index =1