I have a CSV with a structure as:
Test CSV:
Column A | Column B |
---|---|
abc-dfcv | rebtgsergbsedrfgesrg |
water rdfe egreg | |
oluiuilegregreg | |
def fefd | rtjtyujdtgfhndgfhjfh |
water edgregerg | |
rygebfvkjuer |
As can be seen, in each cell of column B there are multiple lines. I need to edit it so only the lines which start with "water" are kept within t of the lines are omitted. This has to be done for all cells in Column B.
The regex statement I've made is re.findall("^water'.*").
I tried to directly apply regex, but it halts and errors at the end of a line within a cell.
Thinking of something along these lines, but blanking on what the regex input should be.
df = pd.read_csv("MyFile.csv") for p in range(len(df.index)): df._set_value(p, "SCHEDULES", str(re.findall("^water'.*", ??????????????? ))) df.to_csv("Nexpose_Schedules.csv", index=False)
CodePudding user response:
You can do it like this:
df = pd.read_csv('MyFile.csv')
df_new = df.loc[df['Column B'].str.contains(r'\bwater', case=False)]
CodePudding user response:
you can use the function 'startswith' instead of regex and the answer would like this:
result = df[df["Column B"].str.startswith("water")]