I want to create a pandas dataframe where there are 5000 columns (n=5000) and one row (row G). For row G, 1 (in 10% of samples) or 0 (in 90% of samples).
import pandas as pd
df = pd.DataFrame({"G": np.random.choice([1,0], p=[0.1, 0.9], size=5000)}).T
I also want to add column names such that it is "Cell" followed by "1..5000":
Cell1 | Cell2 | Cell3 | Cell5000 | |
---|---|---|---|---|
G | 0 | 0 | 1 | 0 |
CodePudding user response:
The columns will default to a RangeIndex
from 0-4999. You can add 1 to the column values, and then use DataFrame.add_prefix
to add the string "Cell" before all of the column names.
df.columns = 1
df = df.add_prefix("Cell")
print(df)
Cell1 Cell2 Cell3 ... Cell5000
G 0 0 0 ... 0
For a single-liner, you can also add 1 and prefix with "Cell" by converting the column index dtype manually.
df.columns = "Cell" (df.columns 1).astype(str)
To make a single row DataFrame, I would construct my data with numpy
in the correct shape instead of transposing a DataFrame. You can also pass in the columns as you want them numbered and the index labelled.
import numpy as np
import pandas as pd
df = pd.DataFrame(
np.random.choice([1,0], p=[.1, .9], size=(1, size)),
columns=np.arange(1, size 1),
index=["G"]
).add_prefix("Cell")
print(df)
Cell1 Cell2 Cell3 ... Cell4999 Cell5000
G 0 0 0 ... 0 0
CodePudding user response:
Another Method could be:
size = 5000
pd.DataFrame.from_dict(
{"G": np.random.choice([1,0], p=[0.1, 0.9], size=size)},
columns=(f'Cell{x}' for x in range(1, size 1)),
orient='index'
)
Output:
Cell1 Cell2 Cell3 Cell4 Cell5 Cell6 Cell7 Cell8 Cell9 ... Cell4992 Cell4993 Cell4994 Cell4995 Cell4996 Cell4997 Cell4998 Cell4999 Cell5000
G 0 0 0 0 0 1 0 1 0 ... 0 0 0 0 0 0 0 0 0
[1 rows x 5000 columns]