草稿箱,用于放比微博朋友圈长,比博客论文短的文字稿。
2,557 words
https://rongyi.io @LER0ever

自如保洁体验真的一言难尽

保洁行业整体基本是由找不到工作或者丢了工作中年妇女撑起来的,作为近一年的自如租户,自如这边的保洁基本只能算中规中矩。

先说说优点吧

  • 因为有用户评价制度以及猜测会执行末位淘汰,所以保洁阿姨的态度都是非常好的。
  • 自如在特殊时期对保洁人员的防疫做的非常到位,并且对用户是透明的。
  • 其他地方清理干不干净我不care,但是每次至少拖地拖得非常干净。
  • 会主动问你要不要帮你洗碗、消毒洗衣机、擦窗,我反正每次都拒绝的

缺点就比较多了,一个一个讲

1. 保洁人员基本都是五六十岁的大妈

~~我荣仲永配不上年轻貌美的小姐姐给我打扫吗 (x~~

倒也不是年龄/性别歧视,就是觉得这个年龄段和性别的人跟自己差别太大了。

这一点其实真的挺影响用户体验的,设想这一个场景:

阳光明媚的周日早上,吃完了早餐,准备接下来两个小时专心做事情。这时候电话响了,归属地显示河北邯郸,接通之后一个不知道哪里口音的大妈:"你好智儒保洁~"。

是不是整个人都不大好了

不过我也理解正常人不大愿意做这些又苦又脏的工作,所以指望有20多岁的小姐姐是没戏了。只有被社会毒打到绝望的一群人,才有可能放下尊严,为了自己和家人的一口饭做这类工作。

2. 对科技产品认知基本不存在

上次保洁,这位大妈对着我的TF -> SD转接卡套:“这壳子还要吗,我给你扔垃圾桶里了"…???

我甚至怀疑去年丢的U盘就是这么没的?

然后还有各种打算拿湿抹布擦一些不能擦的东西,比如蓝牙音箱,可能我的蓝牙音箱长得比较像艺术品她老人家觉得应该好好呵护一下?

所以每次打扫我都会确保我在家,并且全程看着她防止弄坏了/扔了某些非常重要的东西。

3. 眼界不一样

Lab @ rongyi.io Welcome Message

About

This GitLab instance is my personal laboratory and life planner. It's mainly used for long term planning, task management, code hosting and reviewing. Unless invited to this instance, I recommend you to checkout my public projects at GitHub instead.

Links

My Website: https://rongyi.io
My Blog: https://rongyi.blog
My Academic Portfolio: https://rongyi.ai
Email: i+lab@rongyi.io

Registration is restricted to @rongyi.io


关于

链接

个人网站: https://rongyi.io
博客: https://rongyi.blog
学术: https://rongyi.ai
邮箱: i+lab@rongyi.io

注册受限

目前注册仅开放给我授权的个人,以及由我管理的组织。具体邮箱地址白名单如下:

  • rongyi.io 旗下域名邮箱
    • @rongyi .io .dev .blog .ml .page .ai .pro
    • @ry.sb
    • @*.rongyi.io
  • Lumos语言项目组 @lumos.dev
  • 虚幻软件有限公司 ᵠ @void.company
  • 笔墨投资有限公司 @bemore.company
  • 星尘映画工作室 ᵠ @

ᵠ: 鼓励使用各自内部原有的Gitea/Gerrit实例

华为MindSpore AutoParallel自动分布式实现原理

本文为L.E.R Space博客预发稿,欢迎提出修改意见,禁止转载,详见草稿本关于页
发布后以博客/知乎专栏/公众号等平台的最终稿为准。

华为前几天发布了MindSpore,一个“全场景AI框架”,

动态规划算法

// There are 3 meta phases of the Dynamic Programming (DP) algorithm. The input is a CostGraph, and the goal
// is to compute the strategy for each operator in the CostGraph.
//
// Phase 1: Shrink the CostGraph using 6 operations, and record them in the order
//       Using for operations: Operator Elimination, Edge Elimination, Merge Elimination, and Contract Elimination,
//       each connected component in the CostGraph can be shrunk in to the final graph: u --> v. See the
//       interpretation of 6 operations in costmodel.h.
// Phase 2: Search the cost_list in the final graph, and determine the optimal one
//       Create the cost_list for the final graph, and choose the optimal one: one the minimum quantity
//       COST_MODEL_ALPHA * memory_cost + COST_MODEL_BETA * communication_cost
// Phase 3: Recover the original CostGraph, the determine strategy for each operator
//       After determining the optimal cost for the final graph, the algorithm recovers the original graph by applying
//       the 4 operations in the reverse order in the Phase 1. Because each operation decision contains the strategy,
//       the operators' strategies can be all determined.

HLO Tensor Tiling, Pruning

Example

HLO Function

%fused_computation.3461.clone (param_0.15226: f32[3,35,1024], param_1.21991: f32[3,35,1024], param_2.18880: s32[], param_3.11384: s32[], param_4.4941: f32[3,35], param_5.2397: f32[3,35], param_6.2244: s32[], param_7.2283: f32[3,35,1024], param_8.2008: f32[3,35,1024], param_9.1351: f32[3,35,1024], param_10.643: s32[], param_11.421: s32[], param_12.202: s32[], param_13.379: f32[3,35], param_14.480: f32[3,35]) -> f32[3,35,1024] {
  %param_9.1351 = f32[3,35,1024]{2,1,0} parameter(9)
  %param_14.480 = f32[3,35]{1,0} parameter(14)
  %negate.1964 = f32[3,35]{1,0} negate(f32[3,35]{1,0} %param_14.480), metadata={op_type="Neg" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/encdec_attention/layer_prepostprocess/layer_norm/SquaredDifference_grad/Neg"}
  %param_13.379 = f32[3,35]{1,0} parameter(13)
  %negate.1965 = f32[3,35]{1,0} negate(f32[3,35]{1,0} %param_13.379), metadata={op_type="Neg" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/encdec_attention/layer_prepostprocess/layer_norm/sub_grad/Neg"}
  %add.9068 = f32[3,35]{1,0} add(f32[3,35]{1,0} %negate.1964, f32[3,35]{1,0} %negate.1965), metadata={op_type="AddN" op_name="training/gradients/AddN_4"}
  %constant.8355 = pred[] constant(false), metadata={op_type="FloorDiv" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_prepostprocess/layer_norm/Mean_1_grad/floordiv_1"}
  %constant.8354 = s32[] constant(840), metadata={op_type="Size" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_prepostprocess/layer_norm/Mean_1_grad/Prod_1"}
  %param_12.202 = s32[] parameter(12)
  %maximum.1386 = s32[] maximum(s32[] %constant.8354, s32[] %param_12.202), metadata={op_type="Maximum" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/encdec_attention/layer_prepostprocess/layer_norm/Mean_grad/Maximum_1"}
  %constant.8353 = s32[] constant(0), metadata={op_type="FloorDiv" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/encoder/layer_0/self_attention/layer_prepostprocess/layer_norm/Mean_grad/floordiv_1"}
  %compare.2227 = pred[] compare(s32[] %maximum.1386, s32[] %constant.8353), direction=LT, metadata={op_type="FloorDiv" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/encdec_attention/layer_prepostprocess/layer_norm/Mean_grad/floordiv_1"}
  %compare.2228 = pred[] compare(pred[] %constant.8355, pred[] %compare.2227), direction=NE, metadata={op_type="FloorDiv" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/encdec_attention/layer_prepostprocess/layer_norm/Mean_grad/floordiv_1"}
  %param_10.643 = s32[] parameter(10)
  %param_11.421 = s32[] parameter(11)
  %select.606 = s32[] select(pred[] %compare.2228, s32[] %param_10.643, s32[] %param_11.421), metadata={op_type="FloorDiv" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/encdec_attention/layer_prepostprocess/layer_norm/Mean_grad/floordiv_1"}
  %convert.2718 = f32[] convert(s32[] %select.606), metadata={op_type="Cast" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/encdec_attention/layer_prepostprocess/layer_norm/Mean_grad/Cast"}
  %broadcast.18365 = f32[3,35]{1,0} broadcast(f32[] %convert.2718), dimensions={}
  %divide.3442 = f32[3,35]{1,0} divide(f32[3,35]{1,0} %add.9068, f32[3,35]{1,0} %broadcast.18365), metadata={op_type="RealDiv" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/encdec_attention/layer_prepostprocess/layer_norm/Mean_grad/truediv"}
  %broadcast.18367 = f32[3,35,1024]{2,1,0} broadcast(f32[3,35]{1,0} %divide.3442), dimensions={0,1}
  %add.9069 = f32[3,35,1024]{2,1,0} add(f32[3,35,1024]{2,1,0} %param_9.1351, f32[3,35,1024]{2,1,0} %broadcast.18367), metadata={op_type="AddN" op_name="training/gradients/AddN_5"}
  %param_8.2008 = f32[3,35,1024]{2,1,0} parameter(8)
  %add.9070 = f32[3,35,1024]{2,1,0} add(f32[3,35,1024]{2,1,0} %add.9069, f32[3,35,1024]{2,1,0} %param_8.2008), metadata={op_type="AddN" op_name="training/gradients/AddN_5"}
  %param_7.2283 = f32[3,35,1024]{2,1,0} parameter(7)
  %add.9071 = f32[3,35,1024]{2,1,0} add(f32[3,35,1024]{2,1,0} %add.9070, f32[3,35,1024]{2,1,0} %param_7.2283), metadata={op_type="AddN" op_name="training/gradients/AddN_5"}
  %param_5.2397 = f32[3,35]{1,0} parameter(5)
  %negate.1966 = f32[3,35]{1,0} negate(f32[3,35]{1,0} %param_5.2397), metadata={op_type="Neg" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/self_attention/layer_prepostprocess/layer_norm/SquaredDifference_grad/Neg"}
  %param_4.4941 = f32[3,35]{1,0} parameter(4)
  %negate.1967 = f32[3,35]{1,0} negate(f32[3,35]{1,0} %param_4.4941), metadata={op_type="Neg" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/self_attention/layer_prepostprocess/layer_norm/sub_grad/Neg"}
  %add.9072 = f32[3,35]{1,0} add(f32[3,35]{1,0} %negate.1966, f32[3,35]{1,0} %negate.1967), metadata={op_type="AddN" op_name="training/gradients/AddN_7"}
  %param_6.2244 = s32[] parameter(6)
  %maximum.1385 = s32[] maximum(s32[] %constant.8354, s32[] %param_6.2244), metadata={op_type="Maximum" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/self_attention/layer_prepostprocess/layer_norm/Mean_grad/Maximum_1"}
  %compare.2225 = pred[] compare(s32[] %maximum.1385, s32[] %constant.8353), direction=LT, metadata={op_type="FloorDiv" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/self_attention/layer_prepostprocess/layer_norm/Mean_grad/floordiv_1"}
  %compare.2226 = pred[] compare(pred[] %constant.8355, pred[] %compare.2225), direction=NE, metadata={op_type="FloorDiv" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/self_attention/layer_prepostprocess/layer_norm/Mean_grad/floordiv_1"}
  %param_2.18880 = s32[] parameter(2)
  %param_3.11384 = s32[] parameter(3)
  %select.605 = s32[] select(pred[] %compare.2226, s32[] %param_2.18880, s32[] %param_3.11384), metadata={op_type="FloorDiv" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/self_attention/layer_prepostprocess/layer_norm/Mean_grad/floordiv_1"}
  %convert.2717 = f32[] convert(s32[] %select.605), metadata={op_type="Cast" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/self_attention/layer_prepostprocess/layer_norm/Mean_grad/Cast"}
  %broadcast.18368 = f32[3,35]{1,0} broadcast(f32[] %convert.2717), dimensions={}
  %divide.3443 = f32[3,35]{1,0} divide(f32[3,35]{1,0} %add.9072, f32[3,35]{1,0} %broadcast.18368), metadata={op_type="RealDiv" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23/self_attention/layer_prepostprocess/layer_norm/Mean_grad/truediv"}
  %broadcast.18369 = f32[3,35,1024]{2,1,0} broadcast(f32[3,35]{1,0} %divide.3443), dimensions={0,1}
  %add.9073 = f32[3,35,1024]{2,1,0} add(f32[3,35,1024]{2,1,0} %add.9071, f32[3,35,1024]{2,1,0} %broadcast.18369), metadata={op_type="AddN" op_name="training/gradients/AddN_8"}
  %param_1.21991 = f32[3,35,1024]{2,1,0} parameter(1)
  %add.9074 = f32[3,35,1024]{2,1,0} add(f32[3,35,1024]{2,1,0} %add.9073, f32[3,35,1024]{2,1,0} %param_1.21991), metadata={op_type="AddN" op_name="training/gradients/AddN_8"}
  %param_0.15226 = f32[3,35,1024]{2,1,0} parameter(0)
  ROOT %add.9075 = f32[3,35,1024]{2,1,0} add(f32[3,35,1024]{2,1,0} %add.9074, f32[3,35,1024]{2,1,0} %param_0.15226), metadata={op_type="AddN" op_name="training/gradients/AddN_8"}
}

Generate Graph

HLO/Untitled.png

Propagation

Propagate::DFS

High Level Description

a modified version of DFS

BOTE Analysis

Low Level Details

  • Parameters:
    • n: (param_name, dimension) pairs, representing the current node
    • m: HashMap>, current
    • m_constraints: HashMap>,
    • v_node: HashMap
    • v_inst: HashMap
  • fn dfs(n, m, m_constraints, v_node, v_inst)
    • if already_visited(n): return with error
    • v_node.insert(n)
    • m[n.0].insert(n.1)
    • if visited[m] return None, else mark m as visited
    • let suc_once = -1
    • for e in edges starting from n
      • make sure there is no invalid edge
        • valid = true
        • for (i, c) in e
          • if i in v_inst and v_inst[i] ≠ c, then valid = false
        • if not valid, continue
      • let nw = the other endpoint of e
      • if nw in v_node, continue
      • if nw.0 in m_constraints and nw.1 not in m_constraints[nw.0], continue
      • v_inst_clone make a copy of v_inst
      • for (i, c) in e
        • v_inst_clone.insert(i, c) if i not in v_inst
      • let new_map = dfs(nw, m.clone(), m_constraints, v_node.clone(), v_inst_clone)
      • if new_map is None
        • if suc_once == -1, suc_once = 0
        • continue
      • suc_once = 1
      • m = merge m with new_map
    • return None if suc_once == 0, else m

TensorFlow HLO 文本结构化

published @ rongyi.blog: https://rongyi.blog/2020-02-17-hlo-parsing

HLO Text的文件结构

来源: HLO的C++数据结构序列化之后的文本文件

文件全部由一个个函数构成,没有任何其他结构。
每个函数都由函数名、输入参数列表和类型、输出参数类型和函数体构成。
其中的函数体结构类似SSA(Static Single Assignment),每个变量都只会被赋值一次,并且名称唯一。做数据流图DFG的时候可以非常单纯的直接查找变量名找到这个变量被赋值和使用的地方。

每一条SSA指令结构大概如下:

%fusion.8228 = f32[4,32,48,32]{3,2,1,0} fusion(f32[192,1024]{1,0} %dot.2067, f32[] %arg217.0), kind=kLoop, calls=%fused_computation.4684.clone, metadata={op_type="Mul" op_name="transformer/parallel_0_5/transformer/transformer/body/encoder/layer_11/self_attention/multihead_attention/mul"}

大体就是 %var = $type $fn($params), {$metadata…}

结构化处理 attempt 1 (2020/02/11)

Observation

可以观察到每一条指令的数据流动都是从等号从右往左,所以可以尝试直接使用Python对文本做字符串处理,大概思路就是

  1. 按照等号split每一条指令
  2. 等号左边处理%var1,作为左操作数
  3. 等号右边处理%var2, %var3, 作为右操作数
  4. 数据流关系就是左操作数依赖于右操作数

Implementation

代码实现如下

class block:
    def __init__(self):
        self.name = “”
        self.firstline = “”
        self.params = []
        self.body = []
        self.calls = []

class node:
    def __init__(self):
        self.id = “”
        self.label = “”
class edge:
    def __init__(self):
        self.source = “”
        self.target = “”
class graph:
    def __init__(self):
        self.nodes = []
        self.edges = []

result = []

def process_body_line(s):
    # into calls
    ret = []
    call_fn = ["calls=", "to_apply="]
    for w in call_fn:
        while w in s:
            start = s.find(w) + len(w)
            end = s.find(",", start)
            ret.append(s[start:end])
            s = s.replace(w, ''.join(reversed(w)))
    return ret

def process_first_line_into_args(s):
    # into params
    ret = []
    param_end = s.find(->)
    if param_end != -1:
        s = s[0 : param_end - 2]
    param_start = s.find(() + 1
    s = s[param_start : param_end + 1]
    params = s.split(", ")
    for x in params:
        ret.append(x[0:x.find(":")])
    # param_end = s.find(")", param_end) - 1
    return ret

def process_first_line_into_name(s):
    name_end = s.find(" (")
    name = s[0:name_end]
    name = name.replace("ENTRY ", "")
    return name

l = 0
while l < len(lines):
    # print(l)
    line = lines[l]
    if len(line) < 2:
        l = l + 1
        continue
    if line[:2] ==   :
        print(Unhandled Situation, printing surround lines…”)
        print(lines[l - 1], lines[l + 1])
        exit
    if line[0] != " ":
        f = block()
        f.firstline = line.replace("\n", "")
        f.params = process_first_line_into_args(f.firstline)
        f.name = process_first_line_into_name(f.firstline)
        # print(process_first_line_into_name(f.firstline))
        l = l + 1
        line = lines[l]
        while line[0] != }:
            f.calls = f.calls + process_body_line(line)
            f.body.append(line.replace(\n, “”))
            l = l + 1
            line = lines[l]
        result.append(f)
    l = l + 1
    # if l % 10 == 0:
    #     print("Currently l = ", l)

def parse_fn_line(s):
    i = 0
    ret = []
    while i < len(s):
        if s[i] == '%':
            new_var = "%"
            i = i + 1
            while s[i] != ' ' and s[i] != ')' and s[i] != ',':
                new_var += s[i]
                i = i + 1
            ret.append(new_var)
        i = i + 1
    return ret

def parse_fn_dfg(blk):
    # print("parsing fn", blk.name)
    variables = []
    for x in blk.body:
        variables.append(parse_fn_line(x))
    # print(variables)
    g = graph()
    created = set()
    for l in variables:
        for x in l:
            if x in created:
                continue
            n = node()
            n.id = x
            n.label = x#shorten_name(x)
            g.nodes.append(n)
            created.add(x)
    for l in variables:
        if len(l) <= 1:
            continue
        for x in l[1:]:
            e = edge()
            e.source = x
            e.target = l[0]
            g.edges.append(e)
    def dumper(obj):
        try:
            return obj.toJSON()
        except:
            return obj.__dict__
    return json.dumps(g, default=dumper, indent=2)

Limitation

马上问题就来了,HLO指令应该还是偏灵活,
光生成DFG碰到的bad case就有

  • 左操作数不止一个
  • 有可能没有右操作数

今天(134:23)在尝试做Variable Propagation的时候碰到了更多的问题,如

  • 需要读取variable的类型,类似f32[4,32,48,32]{3,2,1,0}这种信息在尝试切分矩阵的时候是必要的
  • 需要识别右边函数的名称,以及那些右操作数会被传入该函数,对不同函数切分矩阵的处理不同
  • 个别操作会在metadata里写重要信息…比如Slice会将slice的dimension放在后面

结构化处理 attempt 2 (2020/02/12)

Observation

尝试了一早上使用Python字符串匹配处理HLO文本,发现会触发各种Corner cases,比如

  • 右操作数可以直接是一个number
  • metadata可以是一个dict
  • metadata可以是一个[a:b] 的数组
  • metadata可以是一个字符串
  • 函数返回值得类型可以是一个数组:返回多个变量的函数

Implementation

下午改用另一种思路,使用词法和语法分析把这个SSA form当成 LL(k) 语法抽象生成语法树
词法比较简单,语法也能套LL1的大多数结构

考虑AST结构如下,内嵌EBNF和语法定义

var HLOLexer = lexer.Must(ebnf.New(`
Comment = ("#" | "//") { "\u0000"…"\uffff"-"\n" } .
Ident = (alpha | "_") { "." | "_" | "-" | alpha | digit } .
String = "\"" {Ident | "/"} "\"" .
VarName = "%" Ident .
Number = { "-" } ("." | digit | "inf") {"." | digit} .
Whitespace = " " | "\t" | "\n" | "\r" .
Rightarrow = "->" .
Assign = "=" .
Punct = "!"…"/" | ":"…"@" | "["…"_" | "{"…"~" .
alpha = "a"…"z" | "A"…"Z" .
digit = "0"…"9" .
`))

type HLO struct {
    Functions []*HLOFunction `@@*`
}

type HLOFunction struct {
    Name        string         `("ENTRY")? @VarName`
    Params      []*Param       `"(" [ @@ { "," @@ } ] ")"`
    ReturnTypes []*Type        `"->" ( "(" [ @@ { "," @@ } ] ")" | @@)`
    Body        []*Instruction `"{" @@ {@@} "}"`
}

type Instruction struct {
    VarName string        `("ROOT")? @VarName "="`
    Fn      *FunctionCall `@@`
    Meta    []*Meta       `{ "," @@ }`
}

type FunctionCall struct {
    ReturnTypes []*RichType  `(@@ | "(" @@ { "," @@ } ")" )`
    Name        string       `@Ident`
    Params      []*RichParam `"(" [ @@ { "," @@ } ] ")"`
}

type Meta struct {
    Key        string  `@Ident "="`
    Value      string  `(@Ident|@VarName|@Number)?`
    DictValue  []*Dict `("{" { @@ } "}")?`
    ListNums   []int   `("{" @Number {"," @Number } "}")?`
    ListSlices []Slice `("{" @@ {"," @@ } "}")?`
}

type Dict struct {
    Key   string `@Ident "="`
    Value string `@String | @Ident`
}

type Slice struct {
    Start int `"[" @Number ":"`
    End   int `@Number "]"`
}

type Param struct {
    Name string `@Ident ":"`
    Type *Type  `@@`
}

type Type struct {
    DataType   string `@Ident`
    Dimensions []int  `"[" [ @Number { "," @Number } ] "]"`
}

type RichParam struct {
    Type *RichType `(@@)?`
    Name string    `@VarName | @Number | @Ident`
}

type RichType struct {
    VarType string `@Ident`
    VarDim  []int  `"[" [ @Number { "," @Number } ] "]" ("{" [ @Number { "," @Number } ] "}")?`
}

Result

解析事例函数如下

%fused_computation.19.clone (param_0.16672: f32[4,49,1024], param_1.23221: f32[196,1024]) -> f32[1024] {
  %param_1.23221 = f32[196,1024]{1,0} parameter(1)
  %reshape.13330 = f32[4,49,1024]{2,1,0} reshape(f32[196,1024]{1,0} %param_1.23221), metadata={op_type="Reshape" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23_1/ffn/conv1/Tensordot/Reshape_grad/Reshape"}
  %param_0.16672 = f32[4,49,1024]{2,1,0} parameter(0)
  %multiply.14985 = f32[4,49,1024]{2,1,0} multiply(f32[4,49,1024]{2,1,0} %reshape.13330, f32[4,49,1024]{2,1,0} %param_0.16672), metadata={op_type="Mul" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23_1/ffn/layer_prepostprocess/layer_norm/mul_1_grad/Mul_1"}
  %constant.11228 = f32[] constant(0), metadata={op_type="RandomUniform" op_name="transformer/parallel_0_5/transformer/transformer/body/dropout/random_uniform/RandomUniform"}
  ROOT %reduce.1954 = f32[1024]{0} reduce(f32[4,49,1024]{2,1,0} %multiply.14985, f32[] %constant.11228), dimensions={0,1}, to_apply=%training_gradients_transformer_parallel_0_5_transformer_transformer_body_decoder_layer_23_1_ffn_layer_prepostprocess_layer_norm_mul_1_grad_Sum_1-reduction.48850, metadata={op_type="Sum" op_name="training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23_1/ffn/layer_prepostprocess/layer_norm/mul_1_grad/Sum_1"}
}

返回Tokens和AST结果为:

[%fused_computation.19.clone   ( param_0.16672 :   f32 [ 4 , 49 , 1024 ] ,   param_1.23221 :   f32 [ 196 , 1024 ] )   ->   f32 [ 1024 ]   {
     %param_1.23221   =   f32 [ 196 , 1024 ] { 1 , 0 }   parameter ( 1 )
     %reshape.13330   =   f32 [ 4 , 49 , 1024 ] { 2 , 1 , 0 }   reshape ( f32 [ 196 , 1024 ] { 1 , 0 }   %param_1.23221 ) ,   metadata = { op_type = "Reshape"   op_name = "training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23_1/ffn/conv1/Tensordot/Reshape_grad/Reshape" }
     %param_0.16672   =   f32 [ 4 , 49 , 1024 ] { 2 , 1 , 0 }   parameter ( 0 )
     %multiply.14985   =   f32 [ 4 , 49 , 1024 ] { 2 , 1 , 0 }   multiply ( f32 [ 4 , 49 , 1024 ] { 2 , 1 , 0 }   %reshape.13330 ,   f32 [ 4 , 49 , 1024 ] { 2 , 1 , 0 }   %param_0.16672 ) ,   metadata = { op_type = "Mul"   op_name = "training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23_1/ffn/layer_prepostprocess/layer_norm/mul_1_grad/Mul_1" }
     %constant.11228   =   f32 [ ]   constant ( 0 ) ,   metadata = { op_type = "RandomUniform"   op_name = "transformer/parallel_0_5/transformer/transformer/body/dropout/random_uniform/RandomUniform" }
     ROOT   %reduce.1954   =   f32 [ 1024 ] { 0 }   reduce ( f32 [ 4 , 49 , 1024 ] { 2 , 1 , 0 }   %multiply.14985 ,   f32 [ ]   %constant.11228 ) ,   dimensions = { 0 , 1 } ,   to_apply = %training_gradients_transformer_parallel_0_5_transformer_transformer_body_decoder_layer_23_1_ffn_layer_prepostprocess_layer_norm_mul_1_grad_Sum_1-reduction.48850 ,   metadata = { op_type = "Sum"   op_name = "training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23_1/ffn/layer_prepostprocess/layer_norm/mul_1_grad/Sum_1" }
 } <EOF>]
&main.HLO{
  Functions: []*main.HLOFunction{
    &main.HLOFunction{
      Name: "%fused_computation.19.clone",
      Params: []*main.Param{
        &main.Param{
          Name: "param_0.16672",
          Type: &main.Type{
            DataType: "f32",
            Dimensions: []int{
              4,
              49,
              1024,
            },
          },
        },
        &main.Param{
          Name: "param_1.23221",
          Type: &main.Type{
            DataType: "f32",
            Dimensions: []int{
              196,
              1024,
            },
          },
        },
      },
      ReturnTypes: []*main.Type{
        &main.Type{
          DataType: "f32",
          Dimensions: []int{
            1024,
          },
        },
      },
      Body: []*main.Instruction{
        &main.Instruction{
          VarName: "%param_1.23221",
          Fn: &main.FunctionCall{
            ReturnTypes: []*main.RichType{
              &main.RichType{
                VarType: "f32",
                VarDim: []int{
                  196,
                  1024,
                  1,
                  0,
                },
              },
            },
            Name: "parameter",
            Params: []*main.RichParam{
              &main.RichParam{
                Name: "1",
              },
            },
          },
        },
        &main.Instruction{
          VarName: "%reshape.13330",
          Fn: &main.FunctionCall{
            ReturnTypes: []*main.RichType{
              &main.RichType{
                VarType: "f32",
                VarDim: []int{
                  4,
                  49,
                  1024,
                  2,
                  1,
                  0,
                },
              },
            },
            Name: "reshape",
            Params: []*main.RichParam{
              &main.RichParam{
                Type: &main.RichType{
                  VarType: "f32",
                  VarDim: []int{
                    196,
                    1024,
                    1,
                    0,
                  },
                },
                Name: "%param_1.23221",
              },
            },
          },
          Meta: []*main.Meta{
            &main.Meta{
              Key: "metadata",
              DictValue: []*main.Dict{
                &main.Dict{
                  Key: “op_type”,
                  Value: “\”Reshape\””,
                },
                &main.Dict{
                  Key: "op_name”,
                  Value: “\”training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23_1/ffn/conv1/Tensordot/Reshape_grad/Reshape\””,
                },
              },
            },
          },
        },
        &main.Instruction{
          VarName: “%param_0.16672”,
          Fn: &main.FunctionCall{
            ReturnTypes: []*main.RichType{
              &main.RichType{
                VarType: "f32",
                VarDim: []int{
                  4,
                  49,
                  1024,
                  2,
                  1,
                  0,
                },
              },
            },
            Name: "parameter",
            Params: []*main.RichParam{
              &main.RichParam{
                Name: "0",
              },
            },
          },
        },
        &main.Instruction{
          VarName: "%multiply.14985",
          Fn: &main.FunctionCall{
            ReturnTypes: []*main.RichType{
              &main.RichType{
                VarType: "f32",
                VarDim: []int{
                  4,
                  49,
                  1024,
                  2,
                  1,
                  0,
                },
              },
            },
            Name: "multiply",
            Params: []*main.RichParam{
              &main.RichParam{
                Type: &main.RichType{
                  VarType: "f32",
                  VarDim: []int{
                    4,
                    49,
                    1024,
                    2,
                    1,
                    0,
                  },
                },
                Name: "%reshape.13330",
              },
              &main.RichParam{
                Type: &main.RichType{
                  VarType: "f32",
                  VarDim: []int{
                    4,
                    49,
                    1024,
                    2,
                    1,
                    0,
                  },
                },
                Name: "%param_0.16672",
              },
            },
          },
          Meta: []*main.Meta{
            &main.Meta{
              Key: "metadata",
              DictValue: []*main.Dict{
                &main.Dict{
                  Key: "op_type",
                  Value: "\"Mul\"",
                },
                &main.Dict{
                  Key: “op_name”,
                  Value: "\"training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23_1/ffn/layer_prepostprocess/layer_norm/mul_1_grad/Mul_1\"",
                },
              },
            },
          },
        },
        &main.Instruction{
          VarName: "%constant.11228",
          Fn: &main.FunctionCall{
            ReturnTypes: []*main.RichType{
              &main.RichType{
                VarType: "f32",
              },
            },
            Name: "constant",
            Params: []*main.RichParam{
              &main.RichParam{
                Name: "0",
              },
            },
          },
          Meta: []*main.Meta{
            &main.Meta{
              Key: "metadata",
              DictValue: []*main.Dict{
                &main.Dict{
                  Key: "op_type",
                  Value: "\"RandomUniform\"",
                },
                &main.Dict{
                  Key: "op_name",
                  Value: "\"transformer/parallel_0_5/transformer/transformer/body/dropout/random_uniform/RandomUniform\"",
                },
              },
            },
          },
        },
        &main.Instruction{
          VarName: “%reduce.1954”,
          Fn: &main.FunctionCall{
            ReturnTypes: []*main.RichType{
              &main.RichType{
                VarType: "f32",
                VarDim: []int{
                  1024,
                  0,
                },
              },
            },
            Name: "reduce",
            Params: []*main.RichParam{
              &main.RichParam{
                Type: &main.RichType{
                  VarType: “f32”,
                  VarDim: []int{
                    4,
                    49,
                    1024,
                    2,
                    1,
                    0,
                  },
                },
                Name: “%multiply.14985”,
              },
              &main.RichParam{
                Type: &main.RichType{
                  VarType: "f32",
                },
                Name: "%constant.11228",
              },
            },
          },
          Meta: []*main.Meta{
            &main.Meta{
              Key: "dimensions",
              ListNums: []int{
                0,
                1,
              },
            },
            &main.Meta{
              Key: "to_apply",
              Value: "%training_gradients_transformer_parallel_0_5_transformer_transformer_body_decoder_layer_23_1_ffn_layer_prepostprocess_layer_norm_mul_1_grad_Sum_1-reduction.48850",
            },
            &main.Meta{
              Key: "metadata",
              DictValue: []*main.Dict{
                &main.Dict{
                  Key: "op_type",
                  Value: "\"Sum\"",
                },
                &main.Dict{
                  Key: "op_name",
                  Value: "\"training/gradients/transformer/parallel_0_5/transformer/transformer/body/decoder/layer_23_1/ffn/layer_prepostprocess/layer_norm/mul_1_grad/Sum_1\"",
                },
              },
            },
          },
        },
      },
    },
  },
}
{Functions:[0xc0000f03c0]}