Hardware validation showing 20 handle grasping trials (5 types × 4 poses) and object manipulation tasks. Videos are not sped up
Dexterous grasping is fundamental to robotics, yet data-driven grasp prediction relies on large, diverse datasets that are costly to generate and typically limited to a narrow set of gripper morphologies. Analytical grasp synthesis can be used to scale data collection, but necessary simplifying assumptions often yield physically infeasible grasps that need to be filtered in high-fidelity simulators, significantly reducing the number of grasps and their diversity.
We propose a scalable generate-and-refine pipeline for synthesizing large-scale, diverse, and physically feasible grasps. Instead of using high-fidelity simulators solely for verification and filtering, we leverage them as an optimization stage that continuously improves grasp quality without discarding precomputed candidates. More specifically, we initialize an evolutionary search with a seed set of analytically generated, potentially suboptimal grasps. We then refine these proposals directly in a high-fidelity simulator (Isaac Sim) using an asynchronous, gradient-free evolutionary algorithm, improving stability while maintaining diversity.
We further distill the refined grasp distribution into a diffusion model for robust real-world deployment, and highlight the role of diversity for both effective training and reliability during deployment. Experiments on a newly introduced handles dataset and DexGraspNet demonstrate improved grasp stability and diversity relative to current baselines.
Our approach combines analytical initialization for diversity, evolutionary refinement in high-fidelity simulation for physical feasibility, and diffusion-based distillation for real-world deployment from partial observations.
Three-stage pipeline: analytical initialization → evolutionary refinement → diffusion distillation
Analytical Initialization. We begin with a diverse set of grasp candidates generated using GraspQP, an analytical optimizer that samples contact points on object surfaces and optimizes for force closure. While many of these initial grasps may be dynamically infeasible due to simplified contact models, they provide good coverage of different grasp modes and taxonomies.
Evolutionary Refinement. Rather than discarding infeasible grasps, we refine them directly in Isaac Sim using an asynchronous genetic algorithm. Each grasp is evaluated by applying forces along six canonical directions and measuring stability. Key innovations include: (1) density-aware selection that penalizes overrepresented grasp clusters, (2) archive-based insertion where new candidates are only added if sufficiently novel or better than their nearest neighbor, (3) structured crossover that swaps entire pose or joint configurations, and (4) contact resampling using farthest point sampling. The evolutionary process runs for 10,000 steps with populations of 32-128 individuals, leveraging massive parallelization in Isaac Sim.
Diffusion Distillation. To enable real-world deployment from partial point clouds, we train a conditional diffusion model on the refined grasp dataset, building upon the DexGraspAnything architecture. The model uses PointTransformerV3 for encoding point clouds and a Diffusion Transformer for grasp generation (9.7M parameters total). We diffuse a 21-dimensional grasp vector (position, orientation, joint angles) conditioned on observed point clouds and hand keypoints. The training objective combines denoising, penetration penalties, and keypoint consistency losses to ensure physically feasible predictions.
Preference Alignment. A key advantage of our evolutionary approach is the ability to steer grasp synthesis toward specific objectives without requiring differentiable metrics. This can be done by e.g. adding energy terms based on the force-closure conditions and/or biasing the fitness with learned reward models from human preference selection.
We evaluate grasp synthesis methods across multiple dimensions to capture both quality and diversity:
Together, these metrics capture the trade-off between grasp quality (success rate) and grasp diversity (UGR, entropy).
We evaluate our approach on both everyday objects and handle manipulation tasks. Our experiments demonstrate that evolutionary refinement improves grasp quality while maintaining diversity across different gripper poses and object interactions.
Evolutionary refinement improves both stability and diversity across objects and handles
The radar plots show that evolutionary refinement (with 32 or 128 GraspQP seeds) consistently achieves higher success rates and unique grasp counts compared to analytical baselines. Notably, refinement from random initialization reaches high success but lower diversity, highlighting the importance of analytical seeding for maintaining multimodal grasp distributions.
Evolution outperforms diffusion training
Fast convergence in ~10k steps
Left: While training a diffusion model on raw GraspQP samples improves grasp coverage, direct simulator-in-the-loop optimization via evolution produces significantly more feasible and diverse grasps. Right: The evolutionary process converges rapidly within 10,000 steps, with most quality improvements occurring in the first few thousand iterations.
We validate our approach on a robotic system with the XHand dexterous gripper. Our diffusion model generates grasps directly from partial point cloud observations, which are then executed using cuRobo motion planning with collision-aware trajectories.
Hardware validation showing 20 handle grasping trials (5 types × 4 poses) and object manipulation tasks. Videos are not sped up
Example rollout sequence showing the robot approaching and grasping cabinet handles with collision-aware motion planning
The system achieves reliable grasping across diverse handle geometries and everyday objects, demonstrating effective sim-to-real transfer.
We introduce a new dataset with handle assets from IKEA, designed for dexterous manipulation research. Each asset includes high-resolution collision meshes for SDF-based simulation, articulated USD files with compliant joints, and realistic material properties.
The dataset covers diverse handle geometries including U-shaped, T-shaped, knobs, and bar handles. All assets are ready for Isaac Sim and can be used for grasp synthesis, reinforcement learning, and sim-to-real transfer.