Home > OS >  Fetching the first few characters of a column from a csv file using pandas
Fetching the first few characters of a column from a csv file using pandas

Time:04-15

I have a csv file which contains some data, here I will put some data.

enter image description here

  • I need to fetch the first two characters from the 'ID' column as an output, where the Quantity = 10 and Max value is greater than 40(which we can fetch from the first two characters from 'Max value' column)

So, the output should be,

02 04

I have tried these solutions so far,

code: var1 = data.loc[{data["Quantity"] == 10) & (data["Max value"].str[:2] == 40)]

var2 = (var1["ID"].str[:2])

print(var2)

output: Empty DataFrame Columns: [ID, Quantity, Max value]

Index: []

  • I thought this happened because the column name contains space character so, Other method,

code: var1 = data.loc[(data.Quantity == 10) & (data.Max value.str[:2] > 40)].ID.str[:2]

var2 = (var1.ID.str[:2])

print(var2)

output: same output

  • Let's change the column name, method 3,

code: data.rename(columns = {'Max value':'MaxValue'}, inplace = True)

var1 = data.loc[(data["Quantity"] == 10) & (data["Max value"].str[:2] > 40)]

var2 = (var1["ID"].str[:2])

print(var2)

output:

Series([], Name: ID, dtype: object)

  • The data exists but nothing is showing up, by the way I have tried the same codes without ".loc".
  • Any thoughts?

CodePudding user response:

This does the job:

df = pd.read_csv(***csv file path***)

df["Max value num"] = [int(max_val[:2]) for max_val in df["Max value"]]
desired_data = df[(df["Quantity"] == 10) & (df["Max value num"] >= 40)]
desired_data = [id[:2] for id in desired_data["ID"]]

This stores the first 2 characters in a list.


If you want to print them out like 02 04, then use this,

df = pd.read_csv(***csv file path***)

df["Max value num"] = [int(max_val[:2]) for max_val in df["Max value"]]
desired_data = df[(df["Quantity"] == 10) & (df["Max value num"] >= 40)]

output = ""
for id in desired_data["ID"]:
  output  = f"{id[:2]} "

output.strip(" ")

For both the codes above, I have added a Max value num column that stores the numeric part of the values in Max value.

  • Related