How to build federated learning model of unbalanced and small dataset-CodePudding

I am working to build a federated learning model using TFF and I have some questions:

I am preparing the dataset, I have separate files of data, with same features and different samples. I would consider each of these files as a single client. How can I maintain this in TFF?
The data is not balanced, meaning, the size of data varies in each file. Is this affecting the modeling process?
The size of the data is a bit small, one file (client) is having 300 records and another is 1500 records, is it suitable to build a federated learning model?

Thanks in advance

CodePudding user response：

You can create a ClientData for your dataset, see Working with tff's ClientData.
The dataset doesn't have to balanced to build a federated learning model. In https://arxiv.org/abs/1602.05629, the server takes weighted federated averaging of client's model updates, where the weights are the number of samples each client has.
A few hundred records per client is no less than the EMNIST dataset, so that would be fine. About the total number of clients: this tutorial shows FL with 10 clients, you can run the colab with smaller NUM_CLIENTS to see how it works on the example dataset.