So I have a code as the following:
Yb=pd.DataFrame(y, column='something')
df_merge = pd.merge(Yb, file, on='something', how='left')
I don't quite understand what does the code do? what do column=
and on=
job here?
CodePudding user response:
columnsIndex or array-like Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.
So the in yb
, y
is the data being accessed, and the column
argument is, well, the columns. Here is a simple example.
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# print dataframe.
df
That will output something like this:
As for df_merge
, we are essentially combining data. It requires two arguments, the left DataFrame, and the right DataFrame. So Yb
and 'file' are your 2 DataFrames that are being merged. Here are the other arguments:
how: This defines what kind of merge to make. It defaults to 'inner', but other possible options include 'outer', 'left', and 'right'.
on: Use this to tell merge() which columns or indices (also called key columns or key indices) you want to join on. This is optional. If it isn’t specified, and left_index and right_index (covered below) are False, then columns from the two DataFrames that share names will be used as join keys. If you use on, then the column or index you specify must be present in both objects.
In this case, how
is set to left
.
Using a left outer join will leave your new merged DataFrame with all rows from the left DataFrame, while discarding rows from the right DataFrame that don’t have a match in the key column of the left DataFrame.
and on
is set to something
, so it will merge specifically the something
columns.
Hope this helped.
CodePudding user response:
In merging two data frames you need a column[s] that in both data frames to join them. eg:
>>> x
x y
0 0 a
1 1 b
2 2 c
>>> y
x y_2
0 0 a
1 1 M
2 2 Q
we need to join both x and y, so if there is no common column we can't join them, if there is column[s] we use it/them to join them. the common column here is 'x', so it appears once in resulting data frame.
>>> x.merge(y, on='x')
x y y_2
0 0 a a
1 1 b M
2 2 c Q