Unified World Model Code Learning
Torch
https://www.codegenes.net/blog/netmodule-pytorch/
https://www.machinelearningexpedition.com/how-to-train-multilayer-perceptron-in-pytorch/
https://timm.fast.ai/schedulers
https://www.geeksforgeeks.org/deep-learning/adam-optimizer/
Basic Usage of Torch : nn.Module and nn.Sequential etc.
-
initialize, load, save the checkpoint:
# initialize torch.manual_seed(0) mlp = SimpleModule(...) # load state = torch.load(ckpt_path, map_location="cpu") mlp.load_state_dict(state) # save torch.save(mlp.state_dict(),ckpt_path) -
Model architecture writing for save and reuse the weights & bias
class SimpleModule(nn.Module): """Simple MLP with persistent layers.""" def __init__( self, layer_number: int = 1, input_dim: int = 10, output_dim: int = 1, ): super().__init__() layers = [] for i in range(layer_number): in_dim = input_dim out_dim = output_dim if i == layer_number - 1 else input_dim layers.append(nn.Linear(in_dim, out_dim)) if i < layer_number - 1: layers.append(nn.ReLU()) self.net = nn.Sequential(*layers) def forward(self, x): return self.net(x)The writing below cannot keep the weights because each time the
mlpobject is created, the layers in it are re-initialized in the__init__(...)method, but if usetorch.manual_seed(0)in themain()would make the output the same each time calling the script (same seed).class SimpleModule(nn.Module): def __init__(self, input_dim=4, hidden_dim=4, output_dim=2): super().__init__() self.linear1 = nn.Linear(input_dim, hidden_dim) self.relu = nn.ReLU() self.linear2 = nn.Linear(hidden_dim, output_dim) def forward(self, x): x = self.linear1(x) x = self.relu(x) x = self.linear2(x) return x -
code explained
# this two equals output = mlp(x) output = mlp.forward(x)try...exceptwriting to cover different situationsif os.path.exists(ckpt_path): try: state = torch.load(ckpt_path, map_location="cpu") mlp.load_state_dict(state) print(f"loaded checkpoint from {ckpt_path}") except RuntimeError as exc: # Likely from an old checkpoint with different layer names; start fresh. print(f"failed to load old checkpoint ({exc}); reinitializing weights") else: print("no checkpoint found, using fresh weights")1st & 2nd ouputs:
(deepcode) huangyangzhou@huangyangzhou-MRGFG-XX:~/github/unified-world-model/self-scripts$ python self-MLP.py failed to load old checkpoint (Error(s) in loading state_dict for SimpleModule: Missing key(s) in state_dict: "net.0.weight", "net.0.bias", "net.2.weight", "net.2.bias". ); reinitializing weights model input: tensor([[1., 2., 3., 4.]]) model output: tensor([[ 0.4872, -0.0244]], grad_fn=<AddmmBackward0>) saved checkpoint to MLP.pth (deepcode) huangyangzhou@huangyangzhou-MRGFG-XX:~/github/unified-world-model/self-scripts$ python self-MLP.py loaded checkpoint from MLP.pth model input: tensor([[1., 2., 3., 4.]]) model output: tensor([[ 0.4872, -0.0244]], grad_fn=<AddmmBackward0>) saved checkpoint to MLP.pth
Train and evaluate model
Diffusion
CFG (classifier free guidance)
from
models.pyin DiT github repo
def forward_with_cfg(self, x, t, y, cfg_scale):
"""
Forward pass of DiT, but also batches the unconditional forward pass for classifier-free guidance.
"""
# https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb
half = x[: len(x) // 2]
combined = torch.cat([half, half], dim=0)
model_out = self.forward(combined, t, y)
# For exact reproducibility reasons, we apply classifier-free guidance on only
# three channels by default. The standard approach to cfg applies it to all channels.
# This can be done by uncommenting the following line and commenting-out the line following that.
# eps, rest = model_out[:, :self.in_channels], model_out[:, self.in_channels:]
eps, rest = model_out[:, :3], model_out[:, 3:]
cond_eps, uncond_eps = torch.split(eps, len(eps) // 2, dim=0)
half_eps = uncond_eps + cfg_scale * (cond_eps - uncond_eps)
eps = torch.cat([half_eps, half_eps], dim=0)
return torch.cat([eps, rest], dim=1)
解释:
if i give a 3 x 256 x 256 image , what would the latent shape be? and how would the latent be computed by cfg (just a example for my understanding , no need to run really)
GitHub Copilot
A 3×256×256 RGB image would be encoded by the SD-style VAE to a latent of shape 4×32×32 (downsample by 8× in H and W, with 4 latent channels).
With
learn_sigma=True, the DiT output per step is 8×32×32 (4 noise + 4 variance); iflearn_sigma=False, it’s 4×32×32 (noise only).CFG example (batch size 2 for simplicity):
- Stack cond and uncond latents:
combined = [x0, x0]→ shape 2×4×32×32.- Run model →
model_out.- Split:
eps = model_out[:, :3](RGB-ish latent channels),rest = model_out[:, 3:](4th latent channel + any sigma channels).- Split
epsinto cond/uncond:cond_eps, uncond_eps.- Guided noise:
guided = uncond_eps + cfg_scale * (cond_eps - uncond_eps).- Duplicate guided to match batch, then
torch.cat([guided, guided], dim=0); finally reattachrest→ output batch of shape matchingmodel_out(either 2×4×32×32 or 2×8×32×32).
DDPM & DDIM (denoising diffusion probablistic / implicit model)
根据预测目标分为两类:
| ε-prediction | x0-prediction | |
|---|---|---|
| 模型输出 | 噪声图 $\epsilon$ | 干净图像 $x_0$ |
| 去噪方式 | 减噪声 | 用x0重构中间步骤 |
| CFG适用性 | 直接适用 | 需要调整公式 |
| 数值稳定性 | 标准 | 有时更好(尤其高分辨率) |
工作流程:
Step 50: 纯噪声 [██████████] 模型预测:"这是猫+噪声"
↓ 减去预测噪声
Step 40: 模糊轮廓 [░░▓▓░░▓▓░░] 模型预测:"还有这些噪声"
↓
Step 20: 大致形状 [░▓▓▓▓▓▓░░] 模型预测:"细节处还有噪声"
↓
Step 0: 清晰图像 [▓▓▓▓▓▓▓▓▓▓] "生成完成"
FID score (Fréchet Inception Distance)
FID 衡量生成图像与真实图像在”特征空间”中的分布距离,越低越好(理想值0,优秀<10,较差>50)。
为什么需要FID?
- 人眼看图主观、慢、无法量化
- 需要自动评估生成质量+多样性
Bash
Basic Usage of du command to check the directory information
du -sh /home/huangyangzhou/miniconda3/envs/deepcode/lib/python3.13/site-packages/torch/lib/*.so* 2>/dev/null | sort -hr | head -15
| 部分 | 作用 |
|---|---|
du -sh ... |
显示每个 .so 文件的总大小(human-readable) |
*.so* |
匹配所有共享库文件(.so, .so.1, .so.2 等) |
2>/dev/null |
把错误信息(如权限不足)丢进黑洞,不显示 |
| sort -hr |
按人类可读的大小格式降序排列(-h 支持 K/M/G,-r 逆序) |
| head -15 |
只显示前 15 行(最大的 15 个文件) |
simple usage forked myself:
(base) huangyangzhou@huangyangzhou-MRGFG-XX:~$ du -ah --max-depth=1 2>/dev/null | sort -hr | head -5
190G .
54G ./.docker
33G ./.cache
29G ./.local
22G ./Downloads
pay attention -sh -ah
du -sh file_or_dir # 看这个文件/文件夹总共多大(最常用)
du -sh */ # 当前目录下每个子文件夹多大
du -ah --max-depth=1 # 当前目录下所有文件+文件夹多大,--max-depth 不能和 -s 同用
ls */ # 列出所有子目录的内容
Basic Usage of tail command
tail -n 100 environment.yml # show last 100 rows :
tail -f test.txt # the display content would change dynamically as the test.txt changes
Basic Usage of rg (ripgrep) command
| 选项 | 简写 | 功能 |
|---|---|---|
--line-number |
-n |
显示行号 |
--ignore-case |
-i |
忽略大小写 |
--word-regexp |
-w |
整词匹配(如error不会匹配出errors) |
--fixed-strings |
-F |
按字面意思搜索(不解析正则符号|) |
--glob |
-g |
按文件名模式过滤 |
--after-context/--before-context/--context |
-A/-B/-C |
显示匹配行前/后/上下都有 N 行 |
--count |
-c |
只显示匹配次数 |
--hidden |
-. |
搜索隐藏文件 |
| 示例 | 含义 |
|---|---|
rg "TODO" -g "*.py" |
只在 .py 文件中搜索 |
rg "TODO" -g "*.md" |
只在 Markdown 文件中搜索 |
rg "class" -g "model*" |
只在 model 开头的文件中搜索 |
示例:
(DiT) huangyangzhou@huangyangzhou-MRGFG-XX:~/github/DiT$ rg -n "AutoencoderKL" -g "*.py"
train.py
32:from diffusers.models import AutoencoderKL
151: vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-{args.vae}").to(device)
sample_ddp.py
19:from diffusers.models import AutoencoderKL
79: vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-{args.vae}").to(device)
更高级应用: 在当前路径下的子路径中搜索
rg -t 'txt' "avg_success_rate" ./sub/path
Basic Usage of ffmpeg command
video & image : .jpg / .png / .mp3 / .webm
sound file : .wav / .avif / .mp4
Basic Usage of Environment management by Conda
environment transmit and copy
conda env create -f environment.yml # create environment from config file
conda env export > environment.yml # create config file and save environment infor inside
Conda configuration in the ~/.bashrc to mangae bash behavior of Conda
Basic Usage of git command & python code
稀疏检出 & 只拉最新版本
# 1. 克隆仓库骨架(不下载文件内容)
# git clone --depth 1 --filter=blob:none --sparse https://github.com/madebyollin/taesd.git
git clone --depth 1 --filter=blob:none --no-checkout https://github.com/madebyollin/taesd.git
# 2. 进入目录
cd taesd
# 3. 初始化非 cone 模式(支持文件级过滤)
git sparse-checkout init --no-cone
# 4. 设置规则:包含所有,排除权重文件
git sparse-checkout set "/*" "!*.pt" "!*.safetensors"
# 5. 检出文件
git checkout
# 6. 查看结果
ls -lh