Quickstart: Using datasets
Nerdalize makes it easy to work with (large) input datasets. A dataset could be anything. For example the map of Europe when you’re forecasting the weather, or millions of datapoints you want to do an analysis over.
Running a container on your local machine with a dataset
Like in the the simple start example we’ll start by testing our image on our local machine.
$ docker run -v <input_data_path>:/input:ro <image> <arguments>
Let’s unpack this:
docker runtells Docker to run something.
-v <input_data_path>:/input:rospecifies what folder should be mounted into the container.
<input_data_path>is your source dir. E.g. /tmp/data.
/inputis the path as it will be available in your container.
:rois bit of the -v command tells docker to mount the input volume read only.
<image>is your image. It will have to be stored on one registries such as Docker store or quay.io.
Run it on Nerdalize
When your container runs on the Nerdalize cluster, your dataset will automatically be mounted onto the /input folder read-only. So it’s important your program or script keeps that into account.
Upload your dataset
$ nerd dataset upload <path> 105.30 MiB / 194.31 MiB [=======----] 51% 4s Uploaded dataset with ID 'd-8595d91c'
Note: Your dataset may be up to 500GB in size.
Start your workload
$ nerd workload start nerdalize/wgetworker --input-dataset d-8595d91c --instances=5 Workload created with ID: caa9ffb86d65b70f8903
From here the instructions are the same as in the the simple start.