Keyu Tao
背景:
Some applications, especially legacy applications or applications which monitor network traffic, expect to be directly connected to the physical network. In this type of situation, you can use the macvlan network driver to assign a MAC address to each container’s virtual network interface, making it appear to be a physical network interface directly connected to the physical network.
https://docs.docker.com/network/macvlan/
TL;DR:
2020/4: @iBug runs apt upgrade
。
之后奇怪的事情发生了:
起初怀疑是 tinc 的问题,但是之后排查毫无收获。
既然如果容器里 ping 外部的机器,对应的机器就能双向连通容器……
那么搞一个 crontab/systemd-timer,或者一直 ping 就行了?
[Unit]Description=Docker pingd service %IDocumentation=man:ping(8)After=network.targetStartLimitIntervalSec=0[Service]Type=simpleUser=rootGroup=rootExecStart=/bin/sh -c 'IVAR="%i"; exec /usr/bin/docker exec "$${IVAR%:*}" ping -q -s 32 "$${IVAR#*:}"'ExecStop=/bin/kill -s INT $MAINPIDRestart=on-failureRestartSec=3[Install]WantedBy=multi-user.targetAlias=docker-ping@.service
systemctl enable docker-pingd@container:host.service
docker-ping@.timer:
[Unit]Description=Docker pingd timer %IDocumentation=man:ping(8)After=network.targetStartLimitIntervalSec=0[Timer]OnCalendar=*:0/10RandomizedDelaySec=60sPersistent=trueUnit=docker-ping@.service[Install]WantedBy=timer.target
docker-ping@.service:
[Unit]Description=Docker ping service (oneshot) %IDocumentation=man:ping(8)After=network.targetStartLimitIntervalSec=0[Service]Type=simpleUser=rootGroup=rootExecStart=/bin/sh -c 'IVAR="%i"; exec /usr/bin/docker exec "$${IVAR%:*}" ping -q -s 32 -c 8 "$${IVAR#*:}"'Restart=on-failureRestartSec=3[Install]WantedBy=multi-user.target
arping
机器 A:没有配置 "pingd"。
arping
。arping
到容器(即使能够双向连通),反方向是可以的。tcpdump
,那么 arping
就很正常。tcpdump
会开启混杂模式。Workaround: 一直开着混杂模式
合理怀疑:是不是 kernel 出了什么问题?
Google: macvlan site:bugs.debian.org
#952660 - src:linux: macvlan multicast/broadcast regression in stretch
2020/02 的 bug:
Linux 4.9.209 included: macvlan: do not assume mac_header is set in macvlan_broadcast()which fixed some TX cases but broke the RX case. When handling areceived multicast or broadcast packet, macvlan_broadcast() now readsthe destination address from the wrong place. The packets may thenfail to match the multicast filters that they should. This is fixedin 4.9.211 by: macvlan: use skb_reset_mac_header() in macvlan_queue_xmit()This is a major regression for VM hosts using macvlan/macvtap asARP and IPv6 neighbour discovery became quite unreliable.
apt upgrade
升级内核,然后 reboot
。
教训:
apt-listbugs
!去年,我们获得了一台新的镜像站服务器,用于改善老机器糟糕的 IO 性能:
lvmcache(7)
lvmcache
基于 dm-cache
。lvmcache
设置缓存数据盘、元数据盘分离。
创建缓存池:
lvconvert --type cache-pool --poolmetadata ssd/mcache_meta --cachemode writethrough -c 1M ssd/mcache
设置缓存:
lvconvert --type cache --cachepool lug/mcache lug/repo
passthrough 模式中,读写都会绕过 cache,唯一的作用是 write hit 会使得 cache 对应的块失效。
这里使用 writeback 模式,因为仓库数据没了还能再同步,使用 writeback 提升性能更合适。
出于稳定考虑,使用 writethrough 模式。(我们的 Cache 太大了,writeback 可能会弄坏不少东西,如果 metadata 坏了就更麻烦了)
转换模式,需要将 cache 中的脏数据写回磁盘。
"Cache policy":
但是……在 mirrors 上更改为 cleaner 之后,cache 并没有写回。
https://bugzilla.redhat.com/show_bug.cgi?id=1668163
The assumption could be - the cache chunksize is >= 1MiB and there is unspecified bigger migration_threshold and remained at defaul value 2048 (1MiB).This prevents kernel from flush blockes.There are several bugs about this.The quick workaround solution is to set higher threshold:lvchange --cachesettings migration_threshold=16384 vg/cacheLV
https://elixir.bootlin.com/linux/latest/source/drivers/md/dm-cache-target.c#L1649
static enum busy spare_migration_bandwidth(struct cache *cache){ bool idle = iot_idle_for(&cache->tracker, HZ); sector_t current_volume = (atomic_read(&cache->nr_io_migrations) + 1) * cache->sectors_per_block; if (idle && current_volume <= cache->migration_threshold) return IDLE; else return BUSY;}
# dirty hacksudo lvchange --cachepolicy cleaner lug/repofor i in `seq 1 1500`; do sudo lvchange --cachesettings migration_threshold=2113536 lug/repo && \ sudo lvchange --cachesettings migration_threshold=16384 && \ echo $i && sleep 15; done;# 需要确认没有脏块。如果还有的话继续执行(次数调小一些)# 如果是从 writeback 切换,需要先把模式切到 writethrough# 然后再修改 cachepolicy 到 smqsudo lvchange --cachepolicy smq lug/repo
2021/03/26:「OpenSSL 项目修复了一个高危漏洞 CVE-2021-3450」
@taoky: 同学们,又到了 CVE 的季节。
@iBug: apt upgrade
-> 发现新 mirrors 机器的 GRUB 更新挂了。
有没有什么命令可以简单复现这个问题?
/usr/sbin/grub-mkconfig
生成一下配置。grub-mkconfig
其实是个 Shell 脚本。set -x
grub-mkconfig
调用了一些外部命令:grub-probe
grub-file
grub-editenv
grub-script-check
如果有无副作用 + 能复现 bug 的命令就很好了!
sudo grub-probe --verbose --device /dev/mapper/lug-root --target=fs_uuid
grub-probe: error: disk `lvmid/<redacted>/<redacted>' not found.
grub-probe: probe device information for GRUB
CFLAGS="-g"
在两个位置设置断点,run,用 print
查看变量信息,调试。
lvmirror { id = "8hdui8-GZGs-2wG6-ayeO-BQo4-z1kM-0fvhvg" status = ["READ", "WRITE", "VISIBLE"] flags = [] creation_time = 1618074879 # 2021-04-11 01:14:39 +0800 creation_host = "taoky-debianvm" segment_count = 1 segment1 { start_extent = 0 extent_count = 13 # 52 Megabytes type = "mirror" mirror_count = 2 mirror_log = "lvmirror_mlog" region_size = 4096 mirrors = [ "lvmirror_mimage_0", 0, "lvmirror_mimage_1", 0 ] }}
grub_lvm_detect()
。先解析 PV 信息,再解析 LV 信息。
循环中:
p
指向 metadata 字符串,这个 pointer 会根据解析的进度移动。segment
这个字符串处 (p = grub_strstr (p, "segment");
)p
的位置),设置 skip_lv = 1;
p
指向下一个 }
(p = grub_strchr (p, '}');
)p += 3
(以便让 p
离开这个 LV 块)}
会怎么样?mcache { id = "BJvm0E-9uCj-Ji6a-N37P-nEuL-evA4-ZDUxdt" status = ["READ", "WRITE"] flags = [] creation_time = 1594104910 # 2020-07-07 14:55:10 +0800 creation_host = "mirrors4" segment_count = 1 segment1 { start_extent = 0 extent_count = 393216 # 1.5 Terabytes type = "cache-pool+METADATA_FORMAT" data = "mcache_cdata" metadata = "mcache_cmeta" chunk_size = 2048 metadata_format = 2 cache_mode = "writethrough" policy = "smq" policy_settings { migration_threshold=16384 } }}
Mirrors 的情况:
validate_node()
失败,boom!简单粗暴:如果遇到不认识的 segment,确保里面的字符已经全部遍历完了再继续。
https://github.com/taoky/grub/commit/c173a3cb7566230093a24c0a1f31aa032678f1f7
基于 Debian GRUB 的 Buster 分支改代码,重新编译包。
gbp clone --pristine-tar https://github.com/taoky/grub.gitapt build-dep grub2dch --local taokydpkg-buildpackage -b -rfakeroot -us -uc
第一次尝试打 Debian 包,所以可能有些命令不太对。
Bug reports:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=985974
https://savannah.gnu.org/bugs/?60385
虽然没有人理我。
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |