【他山之石】优化 JavaScript 的乐趣与价值(上)

前言
这是前几天偶然看到的一篇硬核推文。作者一口气分了 12 个主题探讨了 JavaScript 在优化时应该注意的要点,读后深受启发。由于篇幅较长,分两篇发表。本篇为上篇。

文章目录

  • Optimizing Javascript for fun and for profit
    • 0. Avoid work
    • 1. Avoid string comparisons
            • About benchmarks
    • 2. Avoid different shapes
            • A note on terminology
        • What the eff should I do about this?
            • Number representation
    • 3. Avoid array/object methods
    • 4. Avoid indirection
    • 5. Avoid cache misses
      • 5.1 Prefetching
        • What the eff should I do about this?
      • 5.2 Caching in L1/2/3
        • What the eff should I do about this?
            • About immutable data structures
            • About the { ...spread } operator

Optimizing Javascript for fun and for profit


by romgrk

I often feel like javascript code in general runs much slower than it could, simply because it’s not optimized properly. Here is a summary of common optimization techniques I’ve found useful. Note that the tradeoff for performance is often readability, so the question of when to go for performance versus readability is a question left to the reader. I’ll also note that talking about optimization necessarily requires talking about benchmarking. Micro-optimizing a function for hours to have it run 100x faster is meaningless if the function only represented a fraction of the actual overall runtime to start with. If one is optimizing, the first and most important step is benchmarking. I’ll cover the topic in the later points. Be also aware that micro-benchmarks are often flawed, and that may include those presented here. I’ve done my best to avoid those traps, but don’t blindly apply any of the points presented here without benchmarking.

I have included runnable examples for all cases where it’s possible. They show by default the results I got on my machine (brave 122 on archlinux) but you can run them yourself. As much as I hate to say it, Firefox has fallen a bit behind in the optimization game, and represents a very small fraction of the traffic for now, so I don’t recommend using the results you’d get on Firefox as useful indicators.

0. Avoid work

This might sound evident, but it needs to be here because there can’t be another first step to optimization: if you’re trying to optimize, you should first look into avoiding work. This includes concepts like memoization, laziness and incremental computation. This would be applied differently depending on the context. In React, for example, that would mean applying memo(), useMemo() and other applicable primitives.

1. Avoid string comparisons

Javascript makes it easy to hide the real cost of string comparisons. If you need to compare strings in C, you’d use the strcmp(a, b) function. Javascript uses === instead, so you don’t see the strcmp. But it’s there, and a string comparison will usually (but not always) require comparing each of the characters in the string with the ones in the other string; string comparison is O(n). One common JavaScript pattern to avoid is strings-as-enums. But with the advent of TypeScript this should be easily avoidable, as enums are integers by default.

// No
enum Position {TOP    = 'TOP',BOTTOM = 'BOTTOM',
}
// Yeppers
enum Position {TOP,    // = 0BOTTOM, // = 1
}

Here is a comparison of the costs:

// 1. string compare
const Position = {TOP: 'TOP',BOTTOM: 'BOTTOM',
}let _ = 0
for (let i = 0; i < 1000000; i++) {let current = i % 2 === 0 ?Position.TOP : Position.BOTTOMif (current === Position.TOP)_ += 1
}
// 2. int compare
const Position = {TOP: 0,BOTTOM: 1,
}let _ = 0
for (let i = 0; i < 1000000; i++) {let current = i % 2 === 0 ?Position.TOP : Position.BOTTOMif (current === Position.TOP)_ += 1
}

benchmark1

About benchmarks

Percentage results represent the number of operations completed within 1s, divided by the number of operations of the highest scoring case. Higher is better.

As you can see, the difference can be significant. The difference isn’t necessarily due to the strcmp cost as engines can sometimes use a string pool and compare by reference, but it’s also due to the fact that integers are usually passed by value in JS engines, whereas strings are always passed as pointers, and memory accesses are expensive (see section 5). In string-heavy code, this can have a huge impact.

For a real-world example, I was able to make this JSON5 javascript parser run 2x faster just by replacing string constants with numbers. Unfortunately it wasn’t merged, but that’s how open-source is.

2. Avoid different shapes

Javascript engines try to optimize code by assuming that objects have a specific shape, and that functions will receive objects of the same shape. This allows them to store the keys of the shape once for all objects of that shape, and the values in a separate flat array. To represent it in javascript:

const objects = [{name: 'Anthony',age: 36,},{name: 'Eckhart',age: 42},
]
const shape = [{ name: 'name', type: 'string' },{ name: 'age',  type: 'integer' },
]const objects = [['Anthony', 36],['Eckhart', 42],
]
A note on terminology

I have used the word “shape” for this concept, but be aware that you may also find “hidden class” or “map” used to describe it, depending on the engine.

For example, at runtime if the following function receives two objects with the shape { x: number, y: number }, the engine is going to speculate that future objects will have the same shape, and generate machine code optimized for that shape.

function add(a, b) {return {x: a.x + b.x,y: a.y + b.y,}
}

If one would instead pass an object not with the shape { x, y } but with the shape { y, x }, the engine would need to undo its speculation and the function would suddenly become considerably slower. I’m going to limit my explanation here because you should read the excellent post from mraleph if you want more details, but I’m going to highlight that V8 in particular has 3 modes, for accesses that are: monomorphic (1 shape), polymorphic (2-4 shapes), and megamorphic (5+ shapes). Let’s say you really want to stay monomorphic, because the slowdown is drastic:

// setup
let _ = 0
// 1. monomorphic
const o1 = { a: 1, b: _, c: _, d: _, e: _ }
const o2 = { a: 1, b: _, c: _, d: _, e: _ }
const o3 = { a: 1, b: _, c: _, d: _, e: _ }
const o4 = { a: 1, b: _, c: _, d: _, e: _ }
const o5 = { a: 1, b: _, c: _, d: _, e: _ } // all shapes are equal
// 2. polymorphic
const o1 = { a: 1, b: _, c: _, d: _, e: _ }
const o2 = { a: 1, b: _, c: _, d: _, e: _ }
const o3 = { a: 1, b: _, c: _, d: _, e: _ }
const o4 = { a: 1, b: _, c: _, d: _, e: _ }
const o5 = { b: _, a: 1, c: _, d: _, e: _ } // this shape is different
// 3. megamorphic
const o1 = { a: 1, b: _, c: _, d: _, e: _ }
const o2 = { b: _, a: 1, c: _, d: _, e: _ }
const o3 = { b: _, c: _, a: 1, d: _, e: _ }
const o4 = { b: _, c: _, d: _, a: 1, e: _ }
const o5 = { b: _, c: _, d: _, e: _, a: 1 } // all shapes are different
// test case
function add(a1, b1) {return a1.a + a1.b + a1.c + a1.d + a1.e +b1.a + b1.b + b1.c + b1.d + b1.e }let result = 0
for (let i = 0; i < 1000000; i++) {result += add(o1, o2)result += add(o3, o4)result += add(o4, o5)
}

benchmark2

What the eff should I do about this?

Easier said than done but: create all your objects with the exact same shape. Even something as trivial as writing your React component props in a different order can trigger this.

For example, here are simple cases I found in React’s codebase, but they already had a much higher impact case of the same problem a few years ago because they were initializing an object with an integer, then later storing a float. Yes, changing the type also changes the shape. Yes, there are integer and float types hidden behind number. Deal with it.

Number representation

Engines can usually encode integers as values. For example, V8 represents values in 32 bits, with integers as compact Smi (SMall Integer) values, but floats and large integers are passed as pointers just like strings and objects. JSC uses a 64 bit encoding, double tagging, to pass all numbers by value, as SpiderMonkey does, and the rest is passed as pointers.

3. Avoid array/object methods

I love functional programming as much as anyone else, but unless you’re working in Haskell/OCaml/Rust where functional code gets compiled to efficient machine code, functional will always be slower than imperative.

const result =[1.5, 3.5, 5.0].map(n => Math.round(n)).filter(n => n % 2 === 0).reduce((a, n) => a + n, 0)

The problem with those methods is that:

  1. They need to make a full copy of the array, and those copies later need to be freed by the garbage collector. We will explore more in details the issues of memory I/O in section 5.
  2. They loop N times for N operations, whereas a for loop would allow looping once.
// setup:
const numbers = Array.from({ length: 10_000 }).map(() => Math.random())
// 1. functional
const result =numbers.map(n => Math.round(n * 10)).filter(n => n % 2 === 0).reduce((a, n) => a + n, 0)
// 2. imperative
let result = 0
for (let i = 0; i < numbers.length; i++) {let n = Math.round(numbers[i] * 10)if (n % 2 !== 0) continueresult = result + n
}

benchmark3

Object methods such as Object.values(), Object.keys() and Object.entries() suffer from similar problems, as they also allocate more data, and memory accesses are the root of all performance issues. No really I swear, I’ll show you in section 5.

4. Avoid indirection

Another place to look for optimization gains is any source of indirection, of which I can see 3 main sources:

const point = { x: 10, y: 20 }// 1.
// Proxy objects are harder to optimize because their get/set function might
// be running custom logic, so engines can't make their usual assumptions.
const proxy = new Proxy(point, { get: (t, k) => { return t[k] } })
// Some engines can make proxy costs disappear, but those optimizations are
// expensive to make and can break easily.
const x = proxy.x// 2.
// Usually ignored, but accessing an object via `.` or `[]` is also an
// indirection. In easy cases, the engine may very well be able to optimize the
// cost away:
const x = point.x
// But each additional access multiplies the cost, and makes it harder for the
// engine to make assumptions about the state of `point`:
const x = this.state.circle.center.point.x// 3.
// And finally, function calls can also have a cost. Engine are generally good
// at inlining these:
function getX(p) { return p.x }
const x = getX(p)
// But it's not guaranteed that they can. In particular if the function call
// isn't from a static function but comes from e.g. an argument:
function Component({ point, getX }) {return getX(point)
}

The proxy benchmark is particularly brutal on V8 at the moment. Last time I checked, proxy objects were always falling back from the JIT to the interpreter, seeing from those results it might still be the case.

// 1. proxy access
const point = new Proxy({ x: 10, y: 20 }, { get: (t, k) => t[k] })for (let _ = 0, i = 0; i < 100_000; i++) { _ += point.x }
// 2. direct access
const point = { x: 10, y: 20 }
const x = point.xfor (let _ = 0, i = 0; i < 100_000; i++) { _ += x }

benchmark4

I also wanted to showcase accessing a deeply nested object vs direct access, but engines are very good at optimizing away object accesses via escape analysis when there is a hot loop and a constant object. I inserted a bit of indirection to prevent it.

// 1. nested access
const a = { state: { center: { point: { x: 10, y: 20 } } } }
const b = { state: { center: { point: { x: 10, y: 20 } } } }
const get = (i) => i % 2 ? a : blet result = 0
for (let i = 0; i < 100_000; i++) {result = result + get(i).state.center.point.x }
// 2. direct access
const a = { x: 10, y: 20 }.x
const b = { x: 10, y: 20 }.x
const get = (i) => i % 2 ? a : blet result = 0
for (let i = 0; i < 100_000; i++) {result = result + get(i) }

benchmark5

5. Avoid cache misses

This point requires a bit of low-level knowledge, but has implications even in javascript, so I’ll explain. From the CPU point of view, retrieving memory from RAM is slow. To speed things up, it uses mainly two optimizations.

5.1 Prefetching

The first one is prefetching: it fetches more memory ahead of time, in the hope that it’s the memory you’ll be interested in. It always guesses that if you request one memory address, you’ll be interested in the memory region that comes right after that. So accessing data sequentially is the key. In the following example, we can observe the impact of accessing memory in random order.

// setup:
const K = 1024
const length = 1 * K * K// Theses points are created one after the other, so they are allocated
// sequentially in memory.
const points = new Array(length)
for (let i = 0; i < points.length; i++) {points[i] = { x: 42, y: 0 }
}// This array contains the *same data* as above, but shuffled randomly.
const shuffledPoints = shuffle(points.slice())
// 1. sequential
let _ = 0
for (let i = 0; i < points.length; i++) { _ += points[i].x }
// 2. random
let _ = 0
for (let i = 0; i < shuffledPoints.length; i++) { _ += shuffledPoints[i].x }

benchmark6

What the eff should I do about this?

This aspect is probably the hardest to put in practice, because javascript doesn’t have a way of placing objects in memory, but you can use that knowledge to your advantage as in the example above, for example to operate on data before re-ordering or sorting it. You cannot assume that objects created sequentially will stay at the same location after some time because the garbage collector might move them around. There is one exception to that, and it’s arrays of numbers, preferably TypedArray instances:

// from this
const points = [{ x: 0, y: 5 }, { x: 0, y: 10 }]// to this
const points = new Int64Array([0, 5, 0, 10])

For a more detailed example, see this link* .
*Note that it contains some optimizations that are now outdated, but it’s still accurate overall.

5.2 Caching in L1/2/3

The second optimization CPUs use is the L1/L2/L3 caches: those are like faster RAMs, but they are also more expensive, so they are much smaller. They contain RAM data, but act as an LRU cache. Data comes in while it’s “hot” (being worked on), and is written back to the main RAM when new working data needs the space. So the key here is use as little data as possible to keep your working dataset in the fast caches. In the following example, we can observe what are the effects of busting each of the successive caches.

// setup:
const KB = 1024
const MB = 1024 * KB// These are approximate sizes to fit in those caches. If you don't get the
// same results on your machine, it might be because your sizes differ.
const L1  = 256 * KB
const L2  =   5 * MB
const L3  =  18 * MB
const RAM =  32 * MB// We'll be accessing the same buffer for all test cases, but we'll
// only be accessing the first 0 to `L1` entries in the first case,
// 0 to `L2` in the second, etc.
const buffer = new Int8Array(RAM)
buffer.fill(42)const random = (max) => Math.floor(Math.random() * max)
// 1. L1
let r = 0; for (let i = 0; i < 100000; i++) { r += buffer[random(L1)] }
// 2. L2
let r = 0; for (let i = 0; i < 100000; i++) { r += buffer[random(L2)] }
// 3. L3
let r = 0; for (let i = 0; i < 100000; i++) { r += buffer[random(L3)] }
// 4. RAM
let r = 0; for (let i = 0; i < 100000; i++) { r += buffer[random(RAM)] }
What the eff should I do about this?

Ruthlessly eliminate every single data or memory allocations that can be eliminated. The smaller your dataset is, the faster your program will run. Memory I/O is the bottleneck for 95% of programs. Another good strategy can be to split your work into chunks, and ensure you work on a small dataset at a time.

For more details on CPU and memory, see this link.

About immutable data structures

Immutability is great for clarity and correctness, but in the context of performance, updating an immutable data structure means making a copy of the container, and that’s more memory I/O that will flush your caches. You should avoid immutable data structures when possible.

About the { …spread } operator

It’s very convenient, but every time you use it you create a new object in memory. More memory I/O, slower caches!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.xdnf.cn/news/143104.html

如若内容造成侵权/违法违规/事实不符,请联系一条长河网进行投诉反馈,一经查实,立即删除!

相关文章

苍穹外卖Day01-2

导入接口文档 yApi接口管理平台http://api.doc.jiyou-tech.com/ 创建项目 导入接口文件 导入结果界面 Swagger 介绍 使用Swagger你只需要按照它的规范去定义接口及接口相关的信息&#xff0c;就可以做到生成接口文档&#xff0c;以及在线接口调试页面。 官网&#xff1a;ht…

redis群集三种模式:主从复制、哨兵、集群

redis群集有三种模式 redis群集有三种模式&#xff0c;分别是主从同步/复制、哨兵模式、Cluster&#xff0c;下面会讲解一下三种模式的工作方式&#xff0c;以及如何搭建cluster群集 ●主从复制&#xff1a;主从复制是高可用Redis的基础&#xff0c;哨兵和集群都是在主从复制…

【运行】错误分析error: stray ‘\302’ 和 error: stray ‘\240’

您遇到的编译错误主要是由于代码中出现了不受支持的字符&#xff0c;例如 UTF-8 编码下的不可见字符。这通常是由于在代码中使用了特定的空格或格式化字符。 错误分析 error: stray ‘\302’ 和 error: stray ‘\240’ 提示您在文件中发现了无法识别的字符&#xff0c;通常是因…

常耀斌:AI赋能企业数字化转型(清华社发行)

新书地址&#xff1a; 清华出版社&#xff1a;清华大学出版社-图书详情-《AI赋能企业数字化转型》 京东&#xff1a;《AI赋能企业数字化转型 常耀斌 清华大学出版社 9787302669081》【摘要 书评 试读】- 京东图书 内容简介&#xff1a; 在数字经济时代&#xff0c;企业发…

VMware Tools系列三:图解安装完成VMware Tools测试

一、 VMware Tools测试安装是否成功 1. 菜单项检查 VMware Tools 是否已安装&#xff1a; 在 VMware Workstation 中&#xff0c;查看虚拟机的菜单栏&#xff0c;通常在“虚拟机”菜单下会有“重新安装 VMware Tools”的选项&#xff0c;说明 VMware Tools 已经安装。如图所示…

网络设备登录——《路由与交换技术》实验报告

目录 一、实验目的 二、实验设备和环境 三、实验记录 1.通过 Console 登录 步骤1:连接配置电缆。 步骤2:启动PC,运行超级终端。 步骤3:进入Console 配置界面 2.通过 Telnet 登录 步骤1:通过 Console 接口配置 Telnet 用户。 步骤2:配置 super 口令 步骤3:配置登录欢迎…

动手学深度学习(pytorch)学习记录31-批量规范化(batch normalization)[学习记录]

目录 批量规范化&#xff08;batch normalization&#xff09;从头开始实现一个具有张量的批量规范化层简明实现 批量规范化&#xff08;batch normalization&#xff09; 可持续加速深层网络的的收敛速度。再结合残差块&#xff0c;批量规范化使得研究人员可以训练100层以上的…

Java项目实战II基于Java+Spring Boot+MySQL的校园社团信息管理系统(源码+数据库+文档)

目录 一、前言 二、技术介绍 三、系统实现 四、论文参考 五、核心代码 六、源码获取 全栈码农以及毕业设计实战开发&#xff0c;CSDN平台Java领域新星创作者&#xff0c;专注于大学生项目实战开发、讲解和毕业答疑辅导。获取源码联系方式请查看文末 一、前言 在当今高校…

回归预测|2024年2月最新优化算法角蜥优化HLOA|基于角蜥优化BP神经网络数据回归Matlab程序HLOA-BP【优化效果好】

回归预测|2024年2月最新优化算法角蜥优化HLOA|基于角蜥优化BP神经网络数据回归Matlab程序HLOA-BP【优化效果好】 文章目录 一、基本原理1. 角蜥优化算法&#xff08;HLOA&#xff09;简介2. BP 神经网络&#xff08;BP Neural Network&#xff09;简介3. HLOA-BP 回归预测流程总…

Qt开发技巧(四)“tr“使用,时间类使用,Qt容器取值,类对象的删除,QPainter画家类,QString的转换,用好 QVariant类型

继续讲一些Qt技巧操作 1.非必要不用"tr" 如果程序运行场景确定是某一固定语言&#xff0c;就不需要用tr,"tr"之主要针对多语种翻译的&#xff0c;因为tr的本意是包含英文&#xff0c;然后翻译到其他语言比如中文&#xff0c;不要滥用tr&#xff0c;如果没有…

[Python数据可视化]Plotly Express: 地图数据可视化的魅力

在数据分析和可视化的世界中&#xff0c;地图数据可视化是一个强大而直观的工具&#xff0c;它可以帮助我们更好地理解和解释地理数据。Python 的 Plotly Express 库提供了一个简单而强大的方式来创建各种地图。本文将通过一个简单的示例&#xff0c;展示如何使用 Plotly Expre…

FastAdmin CMS 操作手册

FastAdmin CMS 操作手册 概述&#xff1a; 安装&#xff1a; 配置&#xff1a; 模板&#xff1a; 模板目录&#xff1a; 标签&#xff1a; 全局&#xff1a; 文章&#xff1a; 专题&#xff1a; 栏目&#xff1a; 公共参数&#xff1a; 单页&#xff1a; 特殊标签&#xff1a;…

macOS平台TextRank环境配置

1.安装python 2.安装numpy: pip3 install numpy 3.安装networkx: pip3 install networkx 4.安装math2: pip3 install math2 math2安装成功

npm安装时候报错certificate has expired

打开了一个很久没用的电脑&#xff0c;npm和node都装好了&#xff0c;安装包的时候一直报错 request to https://registry.npm.taobao.org/create-react-app failed, reason: certificate has expired而且先报错rollbackFailedOptional 然而npm没什么问题&#xff0c;是ssl过…

TSRPC+Cocos

TSRPC文档: https://tsrpc.cn/docs/get-started/api.html 创建 先创建一个默认的会话项目&#xff0c;找一个文件夹在控制台运行以下代码&#xff1a; npx create-tsrpc-applatest first-api --presets browser # 或者 yarn create tsrpc-app first-api --presets browser运…

【C++语言】C/C++内存管理

一、C/C内存分布 我们先来看一看C/C中有哪些区域&#xff0c;为什么C/C中区分这些区域呢&#xff1f;&#xff1f;不同的数据有不同的存储需求&#xff0c;各个区域满足不同的需求。我们有临时用的数据&#xff0c;该数据是存储在栈帧区域的&#xff1b;在一些数据结构中&#…

『功能项目』回调函数处理死亡【54】

我们打开上一篇53伤害数字UI显示的项目&#xff0c; 本章要做的事情是使用回调函数处理怪物Boss01死亡后增加主角经验值的功能&#xff0c;以及生成一个七秒的升级特效 首先增加一个技能特效重命名为PlayerUpGradeEffect 修改脚本&#xff1a;BossCtrl.cs 修改脚本&#xff1a…

【Linux系统编程】信号的保存与处理

目录 一&#xff0c;信号的保存 1-1&#xff0c;core与Term终止信号 1-2&#xff0c;进程退出与信号的关系 1-3&#xff0c;信号在内核中的表示 1-4&#xff0c;信号操作函数 二&#xff0c;信号的处理 2-1&#xff0c;信号被处理的时期 2-2&#xff0c;内核实现信号的…

zip-password-finder

1.zip-password-finder 对于传统ZIP文件密码的破解&#xff0c;采用密码匹配的方式进行实现&#xff0c;该github库的地址是&#xff1a; GitHub - agourlay/zip-password-finder: Find the password of protected ZIP files.Find the password of protected ZIP files. Cont…

整数二分算法和浮点数二分算法

整数二分算法和浮点数二分算法 二分 现实中运用到二分的就是猜数字的游戏 假如有A同学说B同学所说数的大小&#xff0c;B同学要在1~100中间猜中数字65&#xff0c;当B同学每次说的数都是范围的一半时这就算是一个二分查找的过程 二分查找的前提是这个数字序列要有单调性 基…