Getting Started:
Dataset Definition

Dataset Robot

To build a high-quality universal robotic dataset, we perform data collection using a purpose-built acquisition robot combined with a dedicated software platform, followed by rigorous data validation.
The RS acquisition robot is capable of executing a wide variety of general-purpose tasks spanning daily life (e.g., object pick-and-place, cleaning and tidying), simplified medical procedures (e.g., simulated instrument hand-off, environmental disinfection assistance), and basic industrial operations (e.g., part handling, simple assembly). While performing these tasks, the robot captures rich environmental information and comprehensive proprioceptive data via its multimodal sensor suite. In addition to conventional RGB imagery, the dataset includes fine-grained robot-centric measurements—joint angles, velocities, end-effector poses, force feedback, and more—essential for training RL algorithms.

The dataset has the following characteristics:

High-fidelity data acquisition: Ensuring that the collected sensor data has high precision and low latency, truly reflecting the interaction process between the robot and the environment. For example, high-frame-rate cameras and precise force sensors can capture subtle operational details and changes in contact forces.
Synchronization and timestamp alignment: All visual sensor data streams are precisely hardware-synchronized and timestamped to ensure consistency between different modalities of data, which is crucial for subsequent data fusion and model training.
Task and scenario diversity: The robot is designed to be easily configurable and deployable in various simulated or real universal task scenarios, thus enabling the collection of diverse data covering a wide range of skills and environments.

Categories and Scope

RS-ManiRW aims to capture the actions performed by robots while executing a variety of universal tasks. These action categories are extensive, aiming to cover multiple key areas where robotic technology may be applied in the real world. The actions in the dataset not only include simple limb movements but also cover complex interaction sequences with the environment, objects, and potential human partners. The action categories and scope included in the data acquisition are not limited to the following:

Activities of Daily Living (ADLs): For example, object manipulation (picking up and placing objects of different shapes and weights from various locations), cleaning and organizing (wiping surfaces, tidying up clutter), opening and closing doors and windows, operating household appliances (such as microwaves, washing machines), and preparing simple food (such as pouring water, stirring). These actions are the foundation for service robots to perform tasks in domestic or office environments.
Basic Industrial Operations: For example, part handling and feeding (grabbing and moving parts on a simulated assembly line), simple assembly and disassembly (such as screwing, inserting components, snap-fit connections), assisting in quality inspection (scanning products with handheld sensors), and packaging and palletizing (placing products into packaging boxes or stacking on pallets). These actions represent the basic applications of robots in manufacturing, logistics, and warehousing industries.
Public Services: Such as IT equipment maintenance (data center inspection, network cable connection, maintenance actions like server on/off), patrol and task execution in public places (such as security checks at airports and high-speed railway stations, object handling), and health and epidemic prevention (quarantine sample placement and operation, environmental disinfection). These actions are the foundation for robots to perform tasks in public and hazardous spaces.

The dataset strives to include a diverse range of specific action instances in each category and records the execution of these actions under different environments, objects, and task objectives. In this way, the dataset can support the training of highly adaptable and generalizable robotic policies.

We also offer highly flexible data customization services, providing customized data acquisition solutions and execution schedules based on global users' requirements for data acquisition scenarios, specific tasks, data structure, and definitions.

Data Dimensions and Annotation

RS-ManiRW pursues comprehensiveness and granularity in data dimensions and annotations to support complex robotic learning and behavioral analysis tasks. The main data dimensions include:
- Robot Proprioceptive State Data: One of the core data types, including the robot's joint angles, joint velocities, end-effector poses (position and orientation), and force/torque feedback from the end-effectors. These data are sampled at high frequencies to accurately record the robot's kinematic and dynamic states.
- Visual Data: Typically includes high-resolution color image streams from one or more RGB cameras, and may also include depth images (if the robot is equipped with depth cameras). These image data provide rich environmental information and visual features of the robot's operating objects.
- Task and Action Commands: Records the high-level task commands (e.g., "pick up the cup on the table") or low-level action commands (e.g., specific joint targets or end-effector trajectories) issued to the robot during data acquisition, or predefined task content and execution sequence.
- Timestamps: All data streams are equipped with precisely synchronized timestamps to ensure temporal consistency between different modalities of data, which is crucial for analyzing dynamic interaction processes.
Data annotation is a key component in enhancing the value of the dataset, mainly including the following aspects:
- Action Segment Segmentation: Dividing the continuous data stream into independent action segments (episodes), each corresponding to a complete task execution attempt or a meaningful behavioral unit.
- Action Labels: Applying semantic action labels to each action segment or keyframes within the segments. These labels can be hierarchical, for example, high-level task labels (such as "making tea"), mid-level action labels (such as "pick up the teacup," "pour hot water"), and low-level action primitive labels (such as "grasp," "move").
- Object Labels and States: Identifying and annotating key objects interacting with the robot (e.g., "cup," "doorknob") and possibly recording the initial and final states of the objects (e.g., object position, orientation, whether successfully operated).
- Task Success/Failure Annotation: Evaluating and annotating the execution results of each action segment, indicating whether the task was successfully completed and the reasons for failure (if applicable).
- Scene Description and Metadata: Providing textual descriptions of the task execution scene (e.g., "operating on the kitchen table"), as well as metadata information about robot configuration, sensor parameters, data acquisition date, etc.

MCAP Metadata Definitions for the Dataset

RS-ManiRW adopts the MCAP (Machine-Generated Data Container) format to support efficient multimodal data storage and streaming.

::: note Based on practical data acquisition project experience, users (robot manufacturers, AI research institutions, data annotation companies, etc.) are more concerned with the biomimicry and subject consistency of the data acquisition hardware platform (robot) (that is, the height and field of view of the visual sensors are close to human anatomical and biological characteristics), and the integrity of the line of sight (that is, the spatial semantic integrity, including: operational details, such as the observable motion of both arms and the end-effector status, and the spatial relationship with periodic objects). For this reason, targeted optimizations have been made, such as better FOV (field of view), data accuracy, and hardware synchronization between multiple cameras.
Data acquisition items may vary depending on customer-specific requirements and model configurations, and different data formats impose distinct field definitions. The information above is provided for reference only; for further customization, please contact the official team or your sales representative.
The following figure is a screenshot of the robot's head view during actual data acquisition, which can clearly and completely present the entire operational scene.

alt text

:::

These MCAP metadata message types provide the dataset with rich details and structured information, enabling the data to be efficiently stored, transmitted, and processed. Through these messages, researchers can precisely understand the robot's perception, state, and behavior during task execution, thus providing a solid data foundation for various robotic learning and analysis tasks.

The following are the definitions of key MCAP metadata message types in the dataset:

Compressed Image Data

Suitable for bandwidth-limited scenarios.

Message Names:

sensor_msgs/msg/ComressedImage

Field Descriptions:

header: Timestamp and coordinate system information.
- stamp.sec: The number of whole seconds since 1970-01-01 UTC.
- stamp.nanosec: Nanosecond offset (0-10⁹).
- frame_id: Camera coordinate system (e.g., "head_camera_optical").
format: Compression format ("jpeg", "png", "tiff").
data: Compressed image binary data.
Application: Image transmission for head/left/right arm cameras.

Raw Uncompressed Image Data

Providing complete pixel information.

Message Names:

sensor_msgs/msg/Image

Field Descriptions:

header: Timestamp and coordinate system information (same as ID1).
height: Image height in pixels.
width: Image width in pixels.
encoding: Pixel encoding format.
- rgb8: 24-bit RGB.
- bgr8: 24-bit BGR.
- mono8: 8-bit grayscale.
- bayer_rggb8: Bayer raw data.
is_bigendian: Byte order (0=little endian).
step: Bytes per line (width×pixel size).
data: Raw pixel data.
Application: Raw image processing/computer vision.

Camera Calibration Parameters

For Image Correction and 3D Projection.

Message Names:

sensor_msgs/msg/CameraInfo

Field Descriptions:

header: Timestamp and coordinate system information.
height/width: Calibration resolution.
distortion_model: Distortion model ("plumb_bob" = radial + tangential distortion).
d:Distortion parameters [k1,k2,p1,p2,k3].
k: 3×3 intrinsic matrix.
- Format: [fx, 0, cx, 0, fy, cy, 0, 0, 1].
- fx/fy: Focal length (in pixels).
- cx/cy: Principal point coordinates.
r: 3×3 rotation matrix (usually the identity matrix).
p: 3×4 projection matrix.
- Format: [fx', 0, cx', Tx, 0, fy', cy', Ty, 0, 0, 1, 0].
binning_x/y: Pixel binning factor (1 = no binning).
roi: Region of interest.
- x_offset/y_offset: Coordinates of the top-left corner of the region.
- width/height: Size of the region.
- do_rectify: Whether rectification is needed.
Application: Image undistortion and point cloud generation.

Real-time Joint State

For Motion Control and Status Monitoring.

Message Names:

sensor_msgs/msg/JointState

Field Descriptions:

header: Timestamp and base coordinate system (e.g., “base_link”).
name[]: Array of joint names.
- Example: ["shoulder_pan", "shoulder_lift", "elbow", "wrist1", "wrist2", "wrist3"].
position[]: Joint angle (rad) or position (m).
velocity[]: Joint velocity (rad/s or m/s).
effort[]: Joint torque (Nm) or force (N).
Application: Robotic-arm kinematic modeling / control feedback.

Joint-Speed Control Command

For real-time motion control.

Message Names:

rm_ros_interfaces/msg/Jointspeed

Field Descriptions:

joint_speed[]: Current velocity for each joint (RPM).
- Order corresponds to JointState.name[].
Application: Robotic-arm motion control.

Position and Orientation in Space

Object Position and Orientation in Space, used for localization and navigation.

Message Names:

geometry_msgs/msg/Pose

Field Descriptions:

position: 3-D coordinates (m).
- x: forward/backward axis (positive forward).
- y: left/right axis (positive left).
- z: vertical axis (positive up).
orientation: Quaternion rotation.
- x,y,z,w: quaternion elements.
- Identity quaternion: [0,0,0,1] indicates no rotation.
Application: End-effector positioning / object manipulation.

Dexterous-Hand Status

Dexterous-hand status feedback, including finger position, force control, and error information.

Message Names:

rm_ros_interfaces/msg/Rmplusbase
rm_ros_interfaces/msg/Rmplusstate

Field Descriptions:

Rmplusbase_msg:
- manu: Manufacturer identifier.
- type: Device type: 1 - two-finger gripper, 2 - five-finger dexterous hand, 3 - three-finger gripper.
- hv: Hardware version.
- sv: Software version.
- bv: Boot-loader version.
- id: Unique device ID.
- dof: Number of degrees of freedom.
- check: Self-check switch.
- bee: Buzzer on/off.
- force: Force-control support flag.
- touch: Tactile sensing support flag.
- touch_num: Number of tactile sensors.
- touch_sw: Tactile sensing on/off.
- hand: Hand orientation: 1 - left, 2 - right.
- pos_up: Position upper limit, dimensionless.
- pos_low: Position lower limit, dimensionless.
- angle_up: Angle upper limit, 0.01° unit.
- angle_low: Angle lower limit, 0.01° unit.
- speed_up: Speed upper limit, dimensionless.
- speed_low: Speed lower limit, dimensionless.
- force_up: Force upper limit, 0.001 N unit.
- force_low: Force lower limit, 0.001 N unit.
Rmplusstate_msg:
- sys_state: Overall system status.
- dof_state: Current state of each DoF.
- dof_err: Error flags for each DoF.
- pos: Current position of each DoF.
- speed: Current speed of each DoF.
- angle: Current angle of each DoF.
- current: Current current of each DoF.
- normal_force: Normal component of the 3-D tactile force on each DoF.
- tangential_force: Tangential component of the 3-D tactile force on each DoF.
- tangential_force_dir: Direction of the tangential tactile force on each DoF.
- tsa: Tactile self-approach value.
- tma: Tactile mutual-approach value.
- touch_data: Raw tactile sensor data.
- force: Force/torque value for each DoF.

Joint Electrical Information

Message Names:

rm_ros_interfaces/msg/Jointcurrent
rm_ros_interfaces/msg/Jointtemperature
rm_ros_interfaces/msg/Jointvoltage

Field Descriptions:

Real-time joint current Jointcurrent.msg: joint_current: float32 – real-time joint current, precision 0.1 mA.
Real-time joint temperature Jointtemperature.msg: joint_temperature: float32 – real-time joint temperature, precision 0.1 °C.
Real-time joint voltage Jointvoltage.msg: joint_voltage: float32 – real-time joint voltage, precision 0.1 V.

Six-Axis Force Sensor Information

Message Names:

rm_ros_interfaces/msg/Sixforce

Field Descriptions:

External force/torque data Sixforce.msg:
- force_fx: float32, force along the sensor’s X-axis (N).
- force_fy: float32, force along the sensor’s Y-axis (N).
- force_fz: float32, force along the sensor’s Z-axis (N).
- force_mx: float32, torque about the sensor’s X-axis (N·m).
- force_my: float32, torque about the sensor’s Y-axis (N·m).
- force_mz: float32, torque about the sensor’s Z-axis (N·m).

Getting Started: Dataset Definition ​

Dataset Robot ​

Categories and Scope ​

Data Dimensions and Annotation ​

MCAP Metadata Definitions for the Dataset ​

Compressed Image Data ​

Raw Uncompressed Image Data ​

Camera Calibration Parameters ​

Real-time Joint State ​

Joint-Speed Control Command ​

Position and Orientation in Space ​

Dexterous-Hand Status ​

Joint Electrical Information ​

Six-Axis Force Sensor Information ​

Getting Started:
Dataset Definition

Dataset Robot

Categories and Scope

Data Dimensions and Annotation

MCAP Metadata Definitions for the Dataset

Compressed Image Data

Raw Uncompressed Image Data

Camera Calibration Parameters

Real-time Joint State

Joint-Speed Control Command

Position and Orientation in Space

Dexterous-Hand Status

Joint Electrical Information

Six-Axis Force Sensor Information