๐Ÿงจ OSD ํ•˜๋‚˜ ๋บ๋”๋‹ˆ ๋ณต๊ตฌ๊ฐ€ ์•ˆ ๋๋‚˜์š”… scrub์ด ์•ˆ ๋Œ์•„๊ฐˆ ๋•Œ ๊ผญ ๋ด์•ผ ํ•  ๊ฒƒ๋“ค

2โ€“3๋ถ„

์–ผ๋งˆ ์ „ ์‹ค์ œ ์šด์˜ ์ค‘์ด๋˜ Ceph ํด๋Ÿฌ์Šคํ„ฐ์—์„œ, ๊ณ ๊ฐ์˜ ์š”์ฒญ์œผ๋กœ OSD ํ•˜๋‚˜๋ฅผ ์ œ๊ฑฐํ–ˆ๋”๋‹ˆ ์ผ์ด ๊ผฌ์ด๊ธฐ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ฒ˜์Œ์—” ๋ณ„์ผ ์•„๋‹Œ ์ค„ ์•Œ์•˜์–ด์š”. PG ๋ช‡ ๊ฐœ๊ฐ€ ์ž ๊น degraded ๋œจ๋”๋‹ˆ, ๊ธˆ๋ฐฉ recovery ๋˜๊ฒ ์ง€ ์‹ถ์—ˆ๋Š”๋ฐ…

๊ณ„์† HEALTH_WARN์€ ์‚ฌ๋ผ์ง€์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

HEALTH_WARN mon.b is low on available space; 1 pgs not deep-scrubbed in time; 1 pgs not scrubbed in time
[WRN] PG_NOT_DEEP_SCRUBBED: 1 pgs not deep-scrubbed since ...
[WRN] PG_NOT_SCRUBBED: 1 pgs not scrubbed since ...

ceph pg scrub, ceph pg deep-scrub ๋ช…๋ น ๋‹ค ๋„ฃ์–ด๋ด๋„ ๋ฐ˜์‘ ์—†์Œ.

scrub์ด ์•ˆ ๋˜๋‹ˆ๊นŒ PG ์ƒํƒœ๋„ ๋๊นŒ์ง€ clean์œผ๋กœ ๋Œ์•„๊ฐ€์ง€ ์•Š๊ณ , ๊ทธ๋ƒฅ โ€˜๋ณต๊ตฌ ์ค‘โ€™์ธ ์ฒ™๋งŒ ํ•˜๊ณ  ์žˆ๋Š” ์ƒํƒœ.

๐ŸŽฏ ์ด๋•Œ ์•Œ๊ฒŒ ๋œ ํ•ต์‹ฌ ์‚ฌ์‹ค ํ•˜๋‚˜!

scrub์ด recovery๋ž‘ ๊ฐ™์ด ๋Œ์•„๊ฐ€์ง€ ์•Š์œผ๋ฉด, Ceph์€ ๋ณต๊ตฌ๊ฐ€ ์•ˆ ๋๋‚ฉ๋‹ˆ๋‹ค.

์šฐ๋ฆฌ๋Š” ๋Š˜ recovery = ๋ณต๊ตฌ๋ผ๊ณ  ์ƒ๊ฐํ•˜์ž–์•„์š”? ๊ทผ๋ฐ scrub์ด ๋น ์ง„ recovery๋Š” ์‚ฌ์‹ค์ƒ ๋ฐ˜์ชฝ์งœ๋ฆฌ์ž…๋‹ˆ๋‹ค.

์ด๊ฑธ ๋ชฐ๋ผ์„œ ๋ฉฐ์น  ๋™์•ˆ “์™œ ์•ˆ ๋๋‚˜์ง€?” ํ•˜๋ฉด์„œ ๋กœ๊ทธ๋งŒ ๋“ค์—ฌ๋‹ค๋ณด๊ณ  ์žˆ์—ˆ์ฃ …


๐Ÿงฉ ๋ฌธ์ œ ์ƒํ™ฉ ์š”์•ฝ

  • osd.2 ์ œ๊ฑฐ (์‹ฌ์ง€์–ด ์ œ๋Œ€๋กœ ์ œ๊ฑฐํ•จ)
  • ์ผ๋ถ€ PG๊ฐ€ ๋‹ค๋ฅธ OSD๋กœ ์˜ฎ๊ฒจ๊ฐ€๋ฉฐ recovery ์‹œ์ž‘
  • ๋ช‡๋ช‡ PG๋Š” degraded / recovering ์ƒํƒœ๋กœ ๊ณ ์ •
  • scrub ๋ช…๋ น ์ˆ˜๋™์œผ๋กœ ์ณ๋„ ๋ฐ˜์‘ ์—†์Œ
  • ๊ฒฝ๊ณ  ๋ฉ”์‹œ์ง€๋Š” ์ ์  ๋ˆ„์ 

๐Ÿงช ์›์ธ ๋ถ„์„: scrub์ด ์•ˆ ๋Œ์•„๊ฐ€๋Š” ์ด์œ ๋“ค

1. PG ์ƒํƒœ๋ถ€ํ„ฐ ํ™•์ธํ•ด๋ณด์ž

ceph pg 1.0 query

scrub์ด ๋˜๋ ค๋ฉด ์ƒํƒœ๊ฐ€ ๋ฐ˜๋“œ์‹œ active+clean์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

recovering, backfill, degraded ์ƒํƒœ๋ฉด scrub์€ ๋Œ€๊ธฐํ•˜๊ฑฐ๋‚˜ ๋ฌด์‹œ๋จ.

“์•„๋‹ˆ ๊ทธ๋Ÿผ ์–ธ์ œ scrubํ•˜๋ผ๋Š” ๊ฑฐ์ง€?” โ†’ ์ด๊ฒŒ ๋ฐ”๋กœ ๋‘ ๋ฒˆ์งธ ์ฒดํฌํฌ์ธํŠธ์ž…๋‹ˆ๋‹ค.


2. scrub ๊ฐ€๋Šฅํ•œ ์‹œ๊ฐ„๋Œ€๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธ

ceph config get osd osd_scrub_begin_hour
ceph config get osd osd_scrub_end_hour

์˜ˆ๋ฅผ ๋“ค์–ด, ์ƒˆ๋ฒฝ 2์‹œ~6์‹œ๋กœ ์„ค์ •๋ผ ์žˆ์œผ๋ฉด, ๊ทธ ์™ธ ์‹œ๊ฐ„์—” ์ˆ˜๋™ scrub๋„ ๋ฌด์‹œ๋ฉ๋‹ˆ๋‹ค.

์ด๊ฒŒ ์€๊ทผํžˆ ๋งŽ์ด ๋ง‰ํžˆ๋Š” ํฌ์ธํŠธ์ž…๋‹ˆ๋‹ค.


3. recovery ์ค‘ scrub์ด ๋ง‰ํ˜€ ์žˆ๋Š” ๊ฒฝ์šฐ

ceph config get osd osd_scrub_during_recovery

๊ธฐ๋ณธ๊ฐ’์€ false.

์ฆ‰, recovery๊ฐ€ ์ง„ํ–‰ ์ค‘์ด๋ฉด scrub์€ ์ž๋™์ด๋“  ์ˆ˜๋™์ด๋“  ์ „๋ถ€ ์ฐจ๋‹จ์ž…๋‹ˆ๋‹ค.

โ†’ ํ•ด๊ฒฐ๋ฒ•:

ceph config set osd osd_scrub_during_recovery true

์ด ์„ค์ •์„ ๋ฐ”๊ฟ”์ฃผ๋Š” ์ˆœ๊ฐ„, PG ์ƒํƒœ๊ฐ€ ํ•˜๋‚˜๋‘˜์”ฉ clean์œผ๋กœ ๋ฐ”๋€Œ๊ธฐ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ด๊ฒŒ ์ง„์งœ ํ•ต์‹ฌ์ด์—ˆ์Šต๋‹ˆ๋‹ค.


4. PG ๋‹ด๋‹น OSD ์ƒํƒœ๋„ ์ฒดํฌ

ceph pg map 1.0

ํ•ด๋‹น PG๋ฅผ ๋‹ด๋‹นํ•˜๋Š” OSD๋“ค์ด ๋‹ค up + in ์ƒํƒœ์ธ์ง€ ํ™•์ธํ•˜๊ณ ,

ceph osd perf, ceph osd df๋กœ ์ƒํƒœ๋„ ๊ฐ™์ด ๋ด์ฃผ์„ธ์š”.

  • nearfull, slow, down์ด๋ฉด scrub ์•ˆ ๋Œ์•„์š”..
  • latency ๋†’์œผ๋ฉด scrub ์ง€์—ฐ๋จ


๐Ÿ”ง ๋””๋ฒ„๊น…์šฉ scrub ๊ฐ•์ œ ์‹คํ–‰๋„ ์žˆ์Œ

์ • ์•ˆ ๋  ๋•Œ๋Š” ๊ฐ•์ œ๋กœ๋„ ์‹คํ–‰ ๊ฐ€๋Šฅ: (scrub์ด ๋ ๋ ค๋ฉด recovery true ์˜ต์…˜์„ ์ผœ์ค˜์•ผ ํ•จ)

ceph tell pg 1.0 scrub
ceph tell pg 1.0 deep-scrub

๋‹จ, ์šด์˜ํ™˜๊ฒฝ์—์„  ์ง„์งœ ์ง„์งœ ์กฐ์‹ฌํ•ด์„œ ์จ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ํ…Œ์ŠคํŠธ ํ™˜๊ฒฝ์—์„œ ๋จผ์ € ํ•ด๋ณด์„ธ์š”.


โœ… ํ•ต์‹ฌ ์ •๋ฆฌ

์ฒดํฌ ํ•ญ๋ชฉ๊ผญ ํ™•์ธํ•  ๋‚ด์šฉ
PG ์ƒํƒœactive+clean์ธ์ง€
scrub ์‹œ๊ฐ„ ์ œํ•œํŠน์ • ์‹œ๊ฐ„ ์™ธ์—” ๋ฌด์‹œ๋จ
recovery ์ค‘ scrub ํ—ˆ์šฉ ์—ฌ๋ถ€osd_scrub_during_recovery = true๋กœ ์„ค์ • ํ•„์š”
OSD ์ƒํƒœup+in, slow/nearfull ์—ฌ๋ถ€ ์ฒดํฌ
MON ๋””์Šคํฌ ์—ฌ์œ ๋ถ€์กฑํ•˜๋ฉด scrub ์ง€์—ฐ ๋ฐœ์ƒ
๊ฐ•์ œ scrub ์‹คํ–‰๋””๋ฒ„๊น… ๋ชฉ์ ์ผ ๋•Œ๋งŒ ์‹ ์ค‘ํ•˜๊ฒŒ ์‚ฌ์šฉ

๐Ÿง ”๋ณต๊ตฌ ์ค‘”์ด๋ผ๋Š” ๋ง๋งŒ ๋ฏฟ์ง€ ๋งˆ์„ธ์š”

Ceph์ด “๋ณต๊ตฌ ์ค‘”์ด๋ผ๊ณ  ํ•ด์„œ ์ง„์งœ๋กœ ๋ณต๊ตฌ๊ฐ€ ์ง„ํ–‰ ์ค‘์ด๋ผ๋Š” ๋ณด์žฅ์€ ์—†์Šต๋‹ˆ๋‹ค.

scrub์ด ๋ฉˆ์ถฐ ์žˆ์œผ๋ฉด recovery๋Š” ๊ทธ๋ƒฅ ๊ฒ‰๋Œ ๋ฟ, PG๋Š” ์ ˆ๋Œ€ clean ์ƒํƒœ๋กœ ๋Œ์•„์˜ค์ง€ ์•Š์•„์š”.

์šด์˜ ์ค‘ OSD๋ฅผ ์ถ”๊ฐ€ํ•˜๊ฑฐ๋‚˜ ์ œ๊ฑฐํ•œ ์งํ›„, ๋ณต๊ตฌ๊ฐ€ ๋๋‚ฌ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ๋Š”๋ฐ HEALTH_WARN์ด ๊ณ„์† ๋‚จ์•„ ์žˆ๋‹ค๋ฉด?

๐Ÿ‘‰ scrub์ด ๋ง‰ํ˜€ ์žˆ๋Š”์ง€ ๊ผญ ํ™•์ธํ•˜์„ธ์š”.