I am a noob python user and my purpose is got name and shift to next row
import pandas as pd
import numpy as np
df = pd.DataFrame({"1": ['Alfred', 'car', 'bike','Alex','car'],
"2": [np.nan, 'Ford', 'Giant',np.nan,'Toyota'],
"3": [pd.NaT, pd.Timestamp("2018-01-01"),
pd.Timestamp("2018-07-01"),np.nan,pd.Timestamp("2021-01-01")]})
1 2 3
0 Alfred NaN NaT
1 car Ford 2018-01-01
2 bike Giant 2018-07-01
3 Alex NaN NaT
4 car Toyota 2021-01-01
my goal result like as below
df = pd.DataFrame({"transportation": ['car', 'bike','car'],
"Mark": ['Ford', 'Giant','Toyota'],
"BuyDate":[pd.Timestamp("2018-01-01"),
pd.Timestamp("2018-07-01"),pd.Timestamp("2021-01-01")],
"Name":['Alfred','Alfred','Alex']
})
transportation Mark BuyDate Name
0 car Ford 2018-01-01 Alfred
1 bike Giant 2018-07-01 Alfred
2 car Toyota 2021-01-01 Alex
i'm try to search some method , but cannot solve this thanks for see my post and help
CodePudding user response:
You can do this using a helper column and then a forward fill:
# rename columns
df.columns = ["transportation", "Mark", "BuyDate"]
# assumption: the rows where "Mark" is NaN defines the name for the following rows
df["is_name"] = df["Mark"].isna()
# create a new column which is NaN everywhere except for the name rows
df["name"] = np.where(df.is_name, df["transportation"], np.nan)
# do a forward fill to extend the names to all rows
df["name"] = df["name"].fillna(method="ffill")
# filter by non-name rows and drop the temporary is_name column
df = df.loc[~df.is_name].drop("is_name", axis=1)
print(df)
Out:
transportation Mark BuyDate name
1 car Ford 2018-01-01 Alfred
2 bike Giant 2018-07-01 Alfred
4 car Toyota 2021-01-01 Alex
CodePudding user response:
Idea is forward filling missing values by Mark
column to Name
column and then filter rows in same mask:
df.columns = ["Transportation", "Mark", "BuyDate"]
m = df["Mark"].notna()
df["Name"] = df["transportation"].mask(m).ffill()
df = df[m].reset_index(drop=True)
print(df)
Transportation Mark BuyDate Name
0 car Ford 2018-01-01 Alfred
1 bike Giant 2018-07-01 Alfred
2 car Toyota 2021-01-01 Alex
CodePudding user response:
You could use this pipeline:
m = df.iloc[:,1].notna()
(df.assign(Name=df.iloc[:,0].mask(m).ffill()) # add new column
.loc[m] # keep only the columns with info
# below: rework df to fit output
.rename(columns={'1': 'transportation', '2': 'Mark', '3': 'BuyDate'})
.reset_index(drop=True)
)
output:
transportation Mark BuyDate Name
0 car Ford 2018-01-01 Alfred
1 bike Giant 2018-07-01 Alfred
2 car Toyota 2021-01-01 Alex
CodePudding user response:
You can do this like so:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"1": ['Alfred', 'car', 'bike','Alex','car'],
... "2": [np.nan, 'Ford', 'Giant',np.nan,'Toyota'],
... "3": [pd.NaT, pd.Timestamp("2018-01-01"),
... pd.Timestamp("2018-07-01"),np.nan,pd.Timestamp("2021-01-01")]})
>>>
>>> df
1 2 3
0 Alfred NaN NaT
1 car Ford 2018-01-01
2 bike Giant 2018-07-01
3 Alex NaN NaT
4 car Toyota 2021-01-01
>>>
>>> new_df = pd.DataFrame(columns=['Transportation', 'Mark', 'BuyDate', 'Name'])
>>>
>>> j = 0
>>> for i in range(1, df.shape[0]):
... if df.loc[i][1] is np.nan:
... running_name = df.loc[i][0]
... continue
... new_df.loc[j] = list(df.loc[i]) [running_name]
... j = 1
...
>>> new_df
Transportation Mark BuyDate Name
0 car Ford 2018-01-01 Alfred
1 bike Giant 2018-07-01 Alfred
2 car Toyota 2021-01-01 Alex
>>>