Here are common workarounds used by our power users:
Split & Rotate
You can split your dataset into smaller shards according to the size of your dataset (if possible with similar statistics in term of features and labels). Then upload them as different version of the same dataset. You can now train your model by rotating the dataset. If you are working with Jobs, you can script this process, as shown in the below pseudo-code:
for ep in epochs:
for iter in shards:
floyd run \
--data shard_$iter:data_split \
--data previous_job_output:last_model \
'python train.py --data /data_split --resume /last_model/ckp --checkpoint_dir ckp'
If you are working from Workspace, you will have to manually rotate the dataset by attaching and unmounting each dataset at the end of each training shard iteration.
Split & Ensemble
After splitting the dataset as described in the section above, you can train a model for each of those shards, then create a final ensemble.
- FloydHub datasets (version) are capped at 100GB.
- The machine disk storage takes in account both dataset and the current working directory. This means that if you are mounting a dataset of 100GB, then you will have other 100GB of free disk space for your output/code or for other datasets.
We are investigating different possibilities to improve this experience, if you have any feedback or need more help. Please let us know at [email protected].