Checkmark=facebooklinklinkedin+Right-facing arrowCheckmarkDown-pointing chevronLeft-pointing chevronRight-pointing chevrontwitter

Quickstart: Using datasets

Nerdalize makes it easy to work with (large) input datasets. A dataset could be anything. For example the map of Europe when you’re forecasting the weather, or millions of datapoints you want to do an analysis over.

Running a container on your local machine with a dataset

Like in the the simple start example we’ll start by testing our image on our local machine.

$ docker run -v <input_data_path>:/input:ro <image> <arguments>

Let’s unpack this:

Run it on Nerdalize

When your container runs on the Nerdalize cluster, your dataset will automatically be mounted onto the /input folder read-only. So it’s important your program or script keeps that into account.

  1. Upload your dataset

    $ nerd dataset upload <path>
    105.30 MiB / 194.31 MiB [=======----] 51% 4s
    Uploaded dataset with ID 'd-8595d91c'
    

    Note: Your dataset may be up to 500GB in size.

  2. Start your workload

    $ nerd workload start nerdalize/wgetworker --input-dataset d-8595d91c --instances=5
    Workload created with ID: caa9ffb86d65b70f8903
    

From here the instructions are the same as in the the simple start.