Creating a Bucket

To create models, you need to define the location of training data. You can use any cloud data storage services like S3, Drive, Box, Dropbox etc to store data. To add data, you need to define the credentials of the location such that the models can read the data while training.

JSON format

Creating a bucket

While creating a new bucket, you must define the following fields:

  • Bucket name
    Name for your bucket. Bucket name is present in a global shared namespace, hence should be unique.
  • Source
    Choose ‘Url’ if your data is present as downloadable archive on cloud or your server. ‘S3’ is amazon s3, choose this if your data is present in s3.
  • Description
    A brief description of your data.
  • Access type
    In private access type, only you can see and access the bucket. In public access type, the bucket is available to all users but they can’t access any credentials on the bucket.
  • Data Format

Data format

The training data should be provided as archives. Currently we support rar, zip and 7z. The current supported file formats are jpeg/jpg and png for images and json for other numerical, text and other complex data combinations.

For Image classification:

Data can be provided in form of directories where each directory contains images from a single class and directory name corresponds to class name. Alternatively, images can be provided in a single directory and a separate json file containing image to class mapping.

For Numerical, Text and other complex Data combinations:

Data can be provided as json files. The key to data format mapping should be provided when adding the data source.

JSON

JSON format

Using a JSON format

While defining the training data in JSON format, you must define the following fields:

  • Key
    Name of the key as per the JSON document.
  • Data type
    Data type of the value of ‘Key’ .It can be
    • String
    • String Array
    • Integer
    • Integer Array
    • Float
    • Float Array
    • Bucket URL
      If you are referencing to the another bucket in arya.ai, you can give add the reference by bucket and keys.
      • Bucket Name
        Name of bucket referenced (In case the data type is bucket url).
      • Bucket Key
        Key from the referenced bucket (In case the data type is bucket url).
      • Has Labels
        Check in case the values are class-labels. Also mention the number of labels in the subsequent field.

CSV

CSV format

Using a CSV format

While defining the training data in CSV format, you must define the following fields:

  • Header
    Name for single or multiple columns. Same will be available during model/simulation creation .
  • Data type
    Datatype of a single column. Available Data types:
    • String
    • Integer
    • Float
    • Bucket URL
      For referencing other buckets
      • Bucket Name
        Name of bucket referenced (In case the data type is bucket url).
      • Bucket Key
        Key from the referenced bucket (In case the data type is bucket url).
      • Column range
        Column numbers as range for which ‘Header’ will be used as a key.
      • Has Labels
        Check in case the column contains class-labels. Also mention the number of labels in the subsequent field.

Image

image format

Using a Image format

While defining the training data in Image format, you must specify whether the data is segregated class wise into directories and if the directory names are class-labels. Also specify the number of labels.

Note: We do not store the data in any form on our servers. Whenever a user access a public dataset, it is always fetched from the source.