I am trying to import a set of data from a CSV file to shorten my code in python,
Before I try to use the CSV file as a variable, I have confirmed the code is working as below:
csvone = pd.read_csv("csvone.csv")
data_a = csvone[csvone['IP Address'].str.contains(r'10.1\.')]
data_a["PCLOCATION"] = 'LOCATION_A'
data_b = csvone[csvone['IP Address'].str.contains(r'10.2\.')]
data_b["PCLOCATION"] = 'LOCATION_B'
data = [data_a, data_b]
result = pd.concat(data)
print(result)
The result of print(result) return all matched data and add another column named PCLOCATION.
But when I try to use a CSV file as variables it returns error when using the code below:
csvone = pd.read_csv("csvone.csv")
csvdata = pd.read_csv("csvdata.csv")
x = csvdata['IP']
z = csvdata['LOCATION']
pat = "|".join("^" s.replace("." , r"\.") for s in x)
data = csvone[csvone['IP Address'].str.contains(x)]
data["PCLOCATION"] = (z)
print(data)
I get error TypeError: unhashable type: 'Series'.
By using code provided by @Timus, the unhashable error fixed.
The outcome is not as expected, seems like I am missing some part of the code that links the IP and LOCATION in the same row.
The current outcome:
https://i.stack.imgur.com/kmHlU.png
The csv file I am using:
csvdata.csv: https://i.stack.imgur.com/cnfAu.png
csvone.csv: https://i.stack.imgur.com/1JA6S.png
Thanks everyone!
CodePudding user response:
The variable x seems to be a pandas Series object. pandas.DataFrame.str.contains()
requires the pattern (first argument) to be a string, not a series.
Check out examples in https://pandas.pydata.org/docs/reference/api/pandas.Series.str.contains.html
CodePudding user response:
As pointed out by @snag9677, if you look at .str.contains
Series.str.contains(pat, case=True, flags=0, na=None, regex=True)
...
Parameters
pat
:str
- Character sequence or regular expression.
you'll see that you can't give it a pd.Series
, only a string.
So what you could try instead is something like
pat = "|".join("^" s.replace("." , r"\.") for s in x)
data = csvone[csvone['IP Address'].str.contains(pat)]
s.replace("." , r"\.")
is needed since a simple .
is a regex wildcard for any character (except a newline). The ^
is there to make sure to match only at the beginning (I suppose that is what you want). You could do
pat = "|".join(s.replace("." , r"\.") for s in x)
data = csvone[csvone['IP Address'].str.match(pat)]
instead (match
implicitly requires that the pattern match starts at the beginning).