I need to do some pretty specific things in pandas. I'm going to do my best to explain here but if I am being confusing just skip to below where I have the actual examples.
I have dataframe with first name and last name (A and B) and values C. I want first and last name to be combined into D. Second, If C matches some value (lets say X) then I want EITHER: All values of A at x's index to be removed, OR: all column C's with matching A's to be grouped into an array. See below for examples.
Hi All,
The easiest way to explain this is to show the code and explain what I need to do.
I have a pandas dataframe that looks like this:
A | B | C | |
---|---|---|---|
1 | 'A' | 'Apple' | x |
2 | 'A' | 'Apple' | y |
3 | 'B' | 'Banana' | x |
My problem is twofold. First, I need Columns A and B together as one column (D). Like this
D | C | |
---|---|---|
1 | 'Apple A' | x |
2 | 'Apple A' | 'Apple' |
3 | 'Banana B' | 'Banana' |
Second, I need either any row with x to be dropped (with all instances of matching A and B gone) or put together in an array. See below
EITHER THIS, where all occurences of A get removed because one occurance of A corresponds to an occurence of x (italics means deleted):
A | B | C | |
---|---|---|---|
1 | 'A' | 'Apple' | x |
2 | 'A' | 'Apple' | y |
3 | 'B' | 'Banana' | x |
OR THIS
A | B | C | |
---|---|---|---|
1 | 'A' | 'Apple' | [x, y] |
3 | 'B' | 'Banana' | x |
I'm pretty new to pandas and I don't really know where to start with any of this. Any help?
CodePudding user response:
here is one way to do it
# do a groupby and aggregate Column 'C' as list
df.groupby(['A','B'], as_index=False)['C'].agg(list)
A B C
0 'A' 'Apple' [x, y]
1 'B' 'Banana' [x]
CodePudding user response:
Assuming there is a typo in given dataset and row #3 above is df[C] = "z"
.
You can achieve this using nested filters:
df.drop(df[df["A"].isin(df[df["C"] == "x"]["A"])].index)
Full example:
data=[
('A','Apple',"x"),
('A','Apple',"y"),
('B','Banana',"z")
]
columns = ["A","B","C"]
df = pd.DataFrame(data=data, columns=columns)
df.drop(df[df["A"].isin(df[df["C"] == "x"]["A"])].index)
A B C
2 B Banana z