[WIP] Representing DTensor in thunder traces #1907

kshitij12345 · 2025-03-26T10:53:36Z

TODO

For backward, call the .contiguous equivalent on DTensor to convert the grad_output to the standard placement (which is baked in the backward trace) - see AOTDispatch: allow subclasses to correct when we guess metadata of tangents incorrectly pytorch/pytorch#118670.

Design Doc - https://docs.google.com/document/d/1Gqb_jXrL-sSqs-D8KrZdcQinxuUSlccZBnnvbYJfYl0/edit?usp=sharing

Changes -
This PR adds support for DTensor inputs to the jitted function. Most of the additions required to support DTensor are present in thunder/torch/experimental like the DTensorProxy, related prims and tracing utilities for the ATen decomposition.

NOTE: This PR just adds the basic infrastructure to be able to run a simple DTensor program (with torch.mul or torch.add). Coverage will be followed in subsequent PRs.

Following are the main updates:

Prologue: Adds a new primitive check_dtensor_spec_repr which will match the repr of DTensorSpec of the DTensor in question (see the example below). PR also makes sure that besices the DTensorSpec there is tensor metadata check for the DTensor object as well as for the local tensor that it points to. NOTE - Other option for checking DTensorSpec would be to keep the inputs DTensorSpec in the TracingContext and prologue could verify for equality.
DTensorProxy: Adds a new Proxy object to represent the DTensor. This class inherits from TensorProxy as DTensor is a tensor subclass and implements all the same methods that a tensor implements.
Prims and Operations: For computation trace, adds two prims get_dtensor_inner_tensor and construct_dtensor to extract local tensor from DTensor and construct a DTensor respectively. Also, it only adds symbols for two aten operations.
Representation in trace -

Example Program

def fn(x, w):
    return x * w

thunder.jit(fn)(x_dtensor, w_dtensor)

Prologue Trace (relevant snippet)

@torch.no_grad()
@no_autocast
def prologue(*args, **kwargs):
  # args: "Any"
  prims.check_len(args, 2)
  # kwargs: "Any"
  prims.check_len(kwargs, 0)
  l_x_: "DTensor cuda:0 f32[16, 16]" = args[0]
  l_w_: "DTensor cuda:0 f32[16, 16]" = args[1]
  dtensor_spec0: "<class 'NoneType'>" = l_x_._spec
  thunder.torch.experimental.dtensor_prims_and_impl.check_dtensor_spec_repr(dtensor_spec0, "DTensorSpec(mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),), tensor_meta=TensorMeta(shape=torch.Size([16, 16]), stride=(16, 1), dtype=torch.float32))")
  t1: "cuda:0 f32[8, 16]" = l_x_._local_tensor
  prims.check_tensor_shape_and_metadata(t1, (8, 16), 'cuda:0', torch.float32, True)
  prims.check_tensor_shape_and_metadata(l_x_, (16, 16), 'cuda:0', torch.float32, True)
  dtensor_spec2: "<class 'NoneType'>" = l_w_._spec
  thunder.torch.experimental.dtensor_prims_and_impl.check_dtensor_spec_repr(dtensor_spec2, "DTensorSpec(mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),), tensor_meta=TensorMeta(shape=torch.Size([16, 16]), stride=(16, 1), dtype=torch.float32))")
  t3: "cuda:0 f32[8, 16]" = l_w_._local_tensor
  prims.check_tensor_shape_and_metadata(t3, (8, 16), 'cuda:0', torch.float32, False)
  prims.check_tensor_shape_and_metadata(l_w_, (16, 16), 'cuda:0', torch.float32, False)

Computation Trace : There is a torch level symbol dtensor_mul which is decomposed into aten decomposition. This allows an executor to claim dtensor_mul or for fusion executor to fuse the decomposition if it can.

@torch.no_grad()
@no_autocast
def computation(l_x_, l_w_):
  # l_x_: "DTensor cuda:0 f32[16, 16]"
  # l_w_: "DTensor cuda:0 f32[16, 16]"

  # <eval_with_key>.10:5: 	    mul = torch.mul(l_x_, l_w_);  l_x_ = l_w_ = None
  mul = thunder.torch.experimental.dtensor_torch_and_aten_ops.dtensor_mul(l_x_, l_w_)  # mul: "DTensor cuda:0 f32[16, 16]"
    # t4 = thunder.torch.experimental.dtensor_prims_and_impl.get_dtensor_inner_tensor(l_x_)  # t4: "cuda:0 f32[8, 16]"
    # t5 = thunder.torch.experimental.dtensor_prims_and_impl.get_dtensor_inner_tensor(l_w_)  # t5: "cuda:0 f32[8, 16]"
    # t0 = thunder.torch.experimental.dtensor_torch_and_aten_ops.aten_mul(t4, t5)  # t0: "cuda:0 f32[8, 16]"
      # t0 = prims.mul(t4, t5)  # t0: "cuda:0 f32[8, 16]"
    # mul = thunder.torch.experimental.dtensor_prims_and_impl.construct_dtensor(t0, DTensorSpec(mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),), tensor_meta=TensorMeta(shape=(16, 16), stride=(16, 1), dtype=torch.float32)))  # mul: "DTensor cuda:0 f32[16, 16]"
  return (mul,)

Backward Trace

@torch.no_grad()
@no_autocast
def backward_fn(saved_for_backward, cotangents):
  # saved_for_backward: "Collection"
  # cotangents: "Collection"
  C0, C1, = saved_for_backward
  # C0: "Collection"
  # C1: "Collection"
  t2, = cotangents
  # t2: "DTensor cuda:0 f32[16, 16]"
  t20, t21, = C0
  # t20: "cuda:0 f32[8, 16]"
  # t21: "cuda:0 f32[8, 16]"
  # C1 (empty sequence)
  t11 = thunder.torch.experimental.dtensor_prims_and_impl.get_dtensor_inner_tensor(t2)  # t11: "cuda:0 f32[8, 16]"
  t13 = ltorch.mul(t21, t11)  # t13: "cuda:0 f32[8, 16]"
    # t13 = prims.mul(t21, t11)  # t13: "cuda:0 f32[8, 16]"
  t14 = ltorch.mul(t20, t11)  # t14: "cuda:0 f32[8, 16]"
    # t14 = prims.mul(t20, t11)  # t14: "cuda:0 f32[8, 16]"
  t16 = thunder.torch.experimental.dtensor_prims_and_impl.construct_dtensor(t14, DTensorSpec(mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),), tensor_meta=TensorMeta(shape=(16, 16), stride=(16, 1), dtype=torch.float32)))  # t16: "DTensor cuda:0 f32[16, 16]"
  t18 = thunder.torch.experimental.dtensor_prims_and_impl.construct_dtensor(t13, DTensorSpec(mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),), tensor_meta=TensorMeta(shape=(16, 16), stride=(16, 1), dtype=torch.float32)))  # t18: "DTensor cuda:0 f32[16, 16]"
  return (t18, None)

Thank you Masaki, Ivan and Mike for the helpful discussions and guidance!

thunder/dynamo/utils.py

thunder/torch/__init__.py

thunder/torch/experimental/dtensor_prims_and_impl.py

thunder/torch/experimental/dtensor_proxy.py

crcrpar · 2025-03-27T18:41:20Z

thunder/torch/experimental/dtensor_proxy.py

+# Inherit from TensorProxy as DTensor also supports
+# Tensor methods like __add__, __div__, sin, etc.


As I don't remember the behavior, would DTensorProxy.__add__ and others return an instance of DTensorProxy or TensorProxy?

DTensorProxy.__add__ will return an instance of DTensorProxy as after the method resolution, it will finally dispatch to the DTensor symbol.

For a method which hasn't been implemented yet for DTensor, it will error out with

Expected all inputs to be TensorProxy but found {list(map(lambda t: type(t), filter_tensor_proxies))}

Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>

… dtensor-init-support

This reverts commit 225f2e3.

kshitij12345 added 12 commits March 21, 2025 20:28

dtensor support

5873742

add comment

377125a

add more comments

7ab82f6

update comment

e6aa8d3

add test for execpted failing cases

e76fc17

support for method

eaac9f7

update failing case test

94ef69d

remove generated traces

5d81851

undo pre-commit change

7277753

undo debug changes

a8c58e4

update failing test to use thunder.jit

d87b103

update registration helper

b101161

crcrpar reviewed Mar 27, 2025

View reviewed changes

kshitij12345 and others added 3 commits March 31, 2025 18:55

Apply suggestions from code review

b551cb8

Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>

Merge branch 'main' of github.com:Lightning-AI/lightning-thunder into…

1c75a80

… dtensor-init-support

address review and upadte

5854c86

IvanYashchuk added the DTensor Issues about DTensor support in Thunder label Apr 2, 2025

kshitij12345 added 7 commits April 2, 2025 14:26

update dtensor proxy repr

a778830

Merge branch 'main' of github.com:Lightning-AI/lightning-thunder into…

41990d0

… dtensor-init-support

Merge branch 'main' of github.com:Lightning-AI/lightning-thunder into…

eda0277

… dtensor-init-support

Merge branch 'main' of github.com:Lightning-AI/lightning-thunder into…

8abf040

… dtensor-init-support

update jit_ext access to torchfn_to_thunder registry : test

225f2e3

empty commit

2b85b31

Revert "update jit_ext access to torchfn_to_thunder registry : test"

5d0296f

This reverts commit 225f2e3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Representing DTensor in thunder traces #1907

[WIP] Representing DTensor in thunder traces #1907

kshitij12345 commented Mar 26, 2025 •

edited

Loading

crcrpar Mar 27, 2025

kshitij12345 Mar 31, 2025

		# Inherit from TensorProxy as DTensor also supports
		# Tensor methods like __add__, __div__, sin, etc.

[WIP] Representing DTensor in thunder traces #1907

Are you sure you want to change the base?

[WIP] Representing DTensor in thunder traces #1907

Conversation

kshitij12345 commented Mar 26, 2025 • edited Loading

crcrpar Mar 27, 2025

Choose a reason for hiding this comment

kshitij12345 Mar 31, 2025

Choose a reason for hiding this comment

kshitij12345 commented Mar 26, 2025 •

edited

Loading