Import data from csv file as variable for code in Python-CodePudding

I am trying to import a set of data from a CSV file to shorten my code in python,

Before I try to use the CSV file as a variable, I have confirmed the code is working as below:

csvone = pd.read_csv("csvone.csv")

data_a = csvone[csvone['IP Address'].str.contains(r'10.1\.')]
data_a["PCLOCATION"] = 'LOCATION_A'

data_b = csvone[csvone['IP Address'].str.contains(r'10.2\.')]
data_b["PCLOCATION"] = 'LOCATION_B'

data = [data_a, data_b]
result = pd.concat(data)

print(result)

The result of print(result) return all matched data and add another column named PCLOCATION.

But when I try to use a CSV file as variables it returns error when using the code below:

csvone = pd.read_csv("csvone.csv")
csvdata = pd.read_csv("csvdata.csv")

x = csvdata['IP']
z = csvdata['LOCATION']

pat = "|".join("^"   s.replace("." , r"\.") for s in x)
data = csvone[csvone['IP Address'].str.contains(x)]
data["PCLOCATION"] = (z)

print(data)

~~I get error TypeError: unhashable type: 'Series'.~~
By using code provided by @Timus, the unhashable error fixed.

The outcome is not as expected, seems like I am missing some part of the code that links the IP and LOCATION in the same row.

The current outcome:
https://i.stack.imgur.com/kmHlU.png

The csv file I am using:
csvdata.csv: https://i.stack.imgur.com/cnfAu.png
csvone.csv: https://i.stack.imgur.com/1JA6S.png

Thanks everyone!

CodePudding user response：

The variable x seems to be a pandas Series object. pandas.DataFrame.str.contains() requires the pattern (first argument) to be a string, not a series.

Check out examples in https://pandas.pydata.org/docs/reference/api/pandas.Series.str.contains.html

CodePudding user response：

As pointed out by @snag9677, if you look at .str.contains

Series.str.contains(pat, case=True, flags=0, na=None, regex=True)

...

Parameters

pat: str - Character sequence or regular expression.

you'll see that you can't give it a pd.Series, only a string.

So what you could try instead is something like

pat = "|".join("^"   s.replace("." , r"\.") for s in x)
data = csvone[csvone['IP Address'].str.contains(pat)]

s.replace("." , r"\.") is needed since a simple . is a regex wildcard for any character (except a newline). The ^ is there to make sure to match only at the beginning (I suppose that is what you want). You could do

pat = "|".join(s.replace("." , r"\.") for s in x)
data = csvone[csvone['IP Address'].str.match(pat)]

instead (match implicitly requires that the pattern match starts at the beginning).