Checkmark=facebooklinklinkedinOngoing indicator +Left-facing arrowRight-facing arrowCheckmarkDown-pointing chevronLeft-pointing chevronRight-pointing chevrontwitter

Using datasets

Nerdalize makes it easy to work with (large) input datasets. A dataset could be anything. For example the map of Europe when you’re forecasting the weather, or millions of datapoints you want to do an analysis over.

Running a container on your local machine with a dataset

Like in the the simple start example we’ll start by testing our image on our local machine.

$ docker run -v <input_data_path>:/input:ro <image> <arguments>

Let’s unpack this:

Run it on Nerdalize

When your container runs on the Nerdalize cluster, your dataset will automatically be mounted onto the /input folder read-only. So it’s important your program or script keeps that into account.

  1. Upload your dataset.

    $ nerd dataset upload <path>
    105.30 MiB / 194.31 MiB [=======----] 51% 4s
    Uploaded dataset with ID 'd-8595d91c'

    Your dataset may be up to 500GB in size.

  2. Start your workload with the dataset.

    $ nerd workload start --input-dataset <dataset-id>
    Workload created with ID: caa9ffb86d65b70f8903
  3. Create your tasks, etc.

    The rest of the process is exactly same as the simple start without using an input dataset.

That’s it!

You can now start using datasets with your own workloads. If you’re not sure how to start a workload take a look at the simple start. Want to also use your own image? Continue to using private images.