Getting Started:
Dataset Types Support for Multiple Data Formats
The RS Universal Robotic Motion Dataset was designed with the convenience of data exchange and integration in mind, and thus supports a variety of widely-used data formats in the industry, including HDF5 (Hierarchical Data Format version 5) and LeRobot Dataset Format.
- HDF5, as a mature scientific data storage format, is renowned for its ability to handle large-scale, complex heterogeneous data and is widely used in research and industrial fields.
- It can efficiently store and manage a variety of data types, including multidimensional arrays, tables, images, and text.
- Supporting metadata and custom data structures makes HDF5 highly suitable for storing robotic datasets that include multimodal information such as robot states, action commands, sensor readings (e.g., camera images, depth information), and task descriptions.
- HDF5's high-performance features, such as optimized data layout, compression algorithms, and support for parallel I/O, can effectively reduce storage space and improve data read/write efficiency, which is crucial for handling robotic data at the GB or even TB level.
- LeRobot Dataset Format, introduced by Hugging Face, is a standardized organization method designed specifically for robotic learning data, aiming to simplify integration with PyTorch and the Hugging Face ecosystem tools.
- Its core idea is to store the dataset as a combination of Parquet files (for trajectory information, such as robot states and actions) and MP4 video files (for camera observations), supplemented by metadata files like JSON to describe the dataset's structure, content, statistics, and task definitions.
- Parquet files, with their columnar storage and efficient compression characteristics, are highly suitable for storing structured time-series data, such as the robot's joint angles, end-effector poses, and executed actions.
- The MP4 video format can effectively compress image data, saving storage space while maintaining good image quality.
In addition to HDF5 and LeRobot Dataset Format, considering the diverse needs that may arise in practical applications, the RS Universal Robotic Motion Dataset also has the potential to be converted to other formats.
Data Conversion and Integration
To ensure that the ORion Universal Robotic Motion Dataset can be widely utilized and smoothly integrated with various robotic learning frameworks, data analysis tools, and simulation platforms, data conversion and integration mechanisms are crucial.
The core data of the dataset, including the robot's state (such as joint angles, end-effector poses, sensor readings), executed actions, task descriptions, and multimodal information (such as images, depth maps), are stored in one or more standard formats after initial collection and processing, and can be converted between different data formats according to data requirements. This approach minimizes the difficulty for users to migrate and integrate data between different tools and frameworks, thereby accelerating research and applications in robotic learning and related fields.