Home > Blockchain >  appending new values to a Datafram element which is of type list
appending new values to a Datafram element which is of type list

Time:03-25

I am trying to match Sample IDs to a list of tissue_names. One Sample ID could have more than one tissue. Hence, I have created an empty array initially and want to add the tissue names to tissue_name column below.

TCGA_luad['tissue_name'] = 'NA'
for index, row in TCGA_luad.iterrows():
    for item in TCGA_lung_tissue_names:
        if row['Sample ID'] in item:
            if row['tissue_name'] == 'NA':
                TCGA_luad.at[index, 'tissue_name'] = []
                TCGA_luad.at[index, 'tissue_name'].append(item)
            else:
                print('here')
                TCGA_luad.at[index, 'tissue_name'].append(item)

While I have more than one Tissue Name for many of the cases belonging to the same Sample ID, it never goes to the second part of else, and 'here' doesn't get printed.

However, the tissue names doesn't get appended and I get all items as []. Do you know why the append doesn't work?

/tmp/ipykernel_2331339/2964965853.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  TCGA_luad['tissue_name'] = 'NA'

I end up with all of tissue_name column being []

enter image description here

len(TCGA_lung_tissue_names)
3206

TCGA_lung_tissue_names[:3]
['TCGA-05-4244-01A-01-BS1',
 'TCGA-05-4244-01A-01-TS1',
 'TCGA-05-4244-01Z-00-DX1']

CodePudding user response:

I think a simple apply statement will be easier to understand, shorter, and possibly more performant:

df['tissue_name'] = df['Sample ID'].apply(lambda sid: [item for item in TCGA_lung_tissue_names if sid in item] or 'NA')

CodePudding user response:

Probably not the best solution, but the following worked:

TCGA_luad['tissue_name'] = 'NA'

duplicates = {}

for index, row in TCGA_luad.iterrows():
    duplicate_counter = 0
    duplicates[row['Patient ID']] = []
    for item in TCGA_lung_tissue_names:
        if row['Patient ID'] in item:
            if row['Sample ID'] in item:
                duplicate_counter  = 1
                duplicates[row['Patient ID']].append(item)
                if row['tissue_name'] == 'NA':
                    TCGA_luad.at[index, 'tissue_name'] = []
    TCGA_luad.at[index, 'tissue_name'] = duplicates[row['Patient ID']]
                
    if duplicate_counter > 1:
        print(duplicate_counter)
  • Related