Populate empty pandas dataframe with specific conditions-CodePudding

I want to create a pandas dataframe where there are 5000 columns (n=5000) and one row (row G). For row G, 1 (in 10% of samples) or 0 (in 90% of samples).

import pandas as pd
df = pd.DataFrame({"G": np.random.choice([1,0], p=[0.1, 0.9], size=5000)}).T

I also want to add column names such that it is "Cell" followed by "1..5000":

	Cell1	Cell2	Cell3	Cell5000
G	0	0	1	0

CodePudding user response：

The columns will default to a RangeIndex from 0-4999. You can add 1 to the column values, and then use DataFrame.add_prefix to add the string "Cell" before all of the column names.

df.columns  = 1
df = df.add_prefix("Cell")

print(df)
   Cell1  Cell2  Cell3 ...   Cell5000
G      0      0      0 ...          0

For a single-liner, you can also add 1 and prefix with "Cell" by converting the column index dtype manually.

df.columns = "Cell"   (df.columns   1).astype(str)

To make a single row DataFrame, I would construct my data with numpy in the correct shape instead of transposing a DataFrame. You can also pass in the columns as you want them numbered and the index labelled.

import numpy as np
import pandas as pd

df = pd.DataFrame(
    np.random.choice([1,0], p=[.1, .9], size=(1, size)),
    columns=np.arange(1, size 1),
    index=["G"]
).add_prefix("Cell")

print(df)
   Cell1  Cell2  Cell3 ... Cell4999  Cell5000
G      0      0      0 ...        0         0

CodePudding user response：

Another Method could be:

size = 5000

pd.DataFrame.from_dict(
     {"G": np.random.choice([1,0], p=[0.1, 0.9], size=size)},
     columns=(f'Cell{x}' for x in range(1, size 1)),
     orient='index'
)

Output:

   Cell1  Cell2  Cell3  Cell4  Cell5  Cell6  Cell7  Cell8  Cell9  ...  Cell4992  Cell4993  Cell4994  Cell4995  Cell4996  Cell4997  Cell4998  Cell4999  Cell5000
G      0      0      0      0      0      1      0      1      0  ...         0         0         0         0         0         0         0         0         0

[1 rows x 5000 columns]