Nerdalize makes it easy to work with (large) input datasets. A dataset could be anything. For example the map of Europe when you’re forecasting the weather, or millions of datapoints you want to do an analysis over.
Running a container on your local machine with a dataset
Like in the the simple start example we’ll start by testing our image on our local machine.
$ docker run -v <input_data_path>:/input:ro <image> <arguments>
Let’s unpack this:
docker runtells Docker to run something.
-v <input_data_path>:/input:rospecifies what folder should be mounted into the container.
<input_data_path>is your source dir. E.g. /tmp/data.
/inputis the path as it will be available in your container.
:rois bit of the -v command tells docker to mount the input volume read only.
<image>is your image. It will have to be stored on one registries such as Docker store or quay.io.
Run it on Nerdalize
When your container runs on the Nerdalize cluster, your dataset will automatically be mounted onto the /input folder read-only. So it’s important your program or script keeps that into account.
Upload your dataset.
$ nerd dataset upload <path> 105.30 MiB / 194.31 MiB [=======----] 51% 4s Uploaded dataset with ID 'd-8595d91c'
Your dataset may be up to 500GB in size.
Start your workload with the dataset.
$ nerd workload start quay.io/nerdalize/delft3d --input-dataset <dataset-id> Workload created with ID: caa9ffb86d65b70f8903
Create your tasks, etc.
The rest of the process is exactly same as the simple start without using an input dataset.
You can now start using datasets with your own workloads. If you’re not sure how to start a workload take a look at the simple start. Want to also use your own image? Continue to using private images.