word-break:break-all -- Ethiopic BA class line breaking

#40886057 Filed 2023-01-02 Status: Accepted

Summary

Problem

With word-break: break-all, Chromium incorrectly allowed line breaks before BA (Break After) class characters such as U+1361 Ethiopic Wordspace. This violates LB21 from UAX#14, which states that line breaks must not occur before BA class characters. The Ethiopic wordspace would appear alone at the start of a line.

Fix

The ShouldBreakAfterBreakAll function now enforces LB21 by suppressing break opportunities before BA class characters (unless line-break: loose is set). U+007C (Vertical Line |) is excluded from this restriction since it is not relevant for the Ethiopic punctuation use case.

Test Case 1: Ethiopic Wordspace (U+1361)

With word-break: break-all in a narrow container, the Ethiopic wordspace must stay attached to the preceding letter (no break before it).

Ethiopic break-all -- U+1361 must not start a line

word-break: break-all (width: 1.5em)
በ፡በ፡በ፡በ
Unpatched: No difference -- stable Chrome already handles this case correctly.
Patched: Same result. Wordspace stays attached to preceding letter.
Expected rendering

Test Case 2: Ethiopic normal word-break

With default word-break: normal, Ethiopic text wraps at wordspace boundaries. This path does not call ShouldBreakAfterBreakAll and is unaffected by the fix -- shown here as a baseline reference.

Ethiopic normal -- baseline (unaffected by fix)

word-break: normal (width: 3em)
በ፡በ፡በ፡በ
Unpatched/Patched: No difference. This code path (word-break: normal) does not use ShouldBreakAfterBreakAll and is unaffected by the fix.
Expected rendering

Test Case 3: ASCII with BA class -- Vertical Line (U+007C)

U+007C | is a BA class character but is excluded from the LB21 restriction in break-all mode. Breaking before | remains allowed.

ASCII break-all -- U+007C (|) allows break before

word-break: break-all (width: 4em)
abc|def|ghi|jkl
Unpatched: No difference.
Patched: Same result. U+007C is excluded from the BA restriction.
Expected rendering

Test Case 4: line-break:loose with BA class

When line-break: loose is set, the LB21 restriction is relaxed, allowing breaks before BA characters (including hyphens and Ethiopic wordspace).

Ethiopic break-all + line-break:loose

word-break: break-all; line-break: loose (width: 1.5em)
በ፡በ፡በ
Unpatched: No difference.
Patched: Same result. line-break:loose relaxes LB21, so breaks before wordspace are allowed regardless.
Expected rendering

References

csswg-drafts#4765 -- Original W3C issue about Ethiopic word-break behavior

CSS Text 3 -- word-break property

UAX#14 -- Unicode Line Breaking Algorithm

All characters with lb=BA