how to filter a particular column with python pandas?-CodePudding

I have an excel file where I have 2 columns: 'Name' and 'size'. The 'Name' column has multiple file types, namely ".apk, .dat, .vdex, .ttc" etc. But I only want to populate the files with the file extension ending with .apk. I do not want any other file type in the new excel file.

I have written the below code:

import pandas as pd
import json

def json_to_excel():
    with open('installed-files.json') as jf:
        data = json.load(jf)
        df = pd.DataFrame(data)
        new_df = df[df.columns.difference(['SHA256'])]
        new_xl = new_df.to_excel('abc.xlsx')
        return new_xl

def filter_apk():  `MODIFIED CODE`
    old_xl = json_to_excel()
    data = pd.read_excel(old_xl)
    a = data[data["Name"].str.contains("\.apk")]
    a.to_excel('zybg.xlsx')

Above program does following:

json_to_excel(), takes a Json file, converts it to a .xlsx format and save.
filter_apk() is suppose to create multiple excel file based on the file extension present in "Name" column.

1st function is doing what I intend to.
2nd function is not doing anything. Neither its throwing any error. I have followed this weblink

Below are the few samples of the "name" column

/system/product/<Path_to>/abc.apk
/system/fonts/wwwr.ttc
/system/framework/framework.jar
/system/<Path_to>/icu.dat
/system/<Path_to>/Normal.apk
/system/<Path_to>/Tv.apk

How to get that working? Or is there a better way to achieve the objective?

Please suggest.

ERROR

    raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'NoneType'>

Note:

I have all the files at the same location.

CodePudding user response：

There is a difference between your use-case and use-case shown in the weblink. You want to apply a single filter (apk files), whereas the example you saw had multiple filters which were to be applied one after another (multiple species).

This will do the trick.

def filter_apk():
    old_xl = json_to_excel()
    data = pd.read_excel(old_xl)
    a = data[data["Name"].str.contains("\.apk")]
    a.to_excel("<path_to_new_excel>\\new_excel_name.xlsx")

CodePudding user response：

Regarding your new updated question. I guess your first function is not working as you think it is working.

new_xl = new_df.to_excel('abc.xlsx')

This will write an excel file, as you are expecting it to do. Which works.

However, assigning it to new_xl, does not do anything since there is no return on pd.to_excel. So when you return new_xl as output of your json_to_excel function, you actually return None. Therefore in your second function, old_xl = json_to_excel() will make old_xl have the value None.

So, your functions should be something like this:

def json_to_excel():
    with open('installed-files.json') as jf:
        data = json.load(jf)
        df = pd.DataFrame(data)
        new_df = df[df.columns.difference(['SHA256'])]
        new_df.to_excel('abc.xlsx')

def filter_apk(): 
    json_to_excel()
    old_xl = pd.read_excel('abc.xlsx')
    data = pd.read_excel(old_xl)
    a = data[data["Name"].str.contains("\.apk")]
    a.to_excel('zybg.xlsx')