feat: add ReadabilityWithOptions for DOM cleanup #61
Reference in New Issue
Block a user
Delete Branch "feature/readability-remove-selectors"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
ReadabilityOptionsstruct withRemoveSelectors []stringfield for specifying CSS selectors of elements to remove before readability extractionReadabilityWithOptions()function that applies DOM cleanup before parsingReadability()delegates toReadabilityWithOptionswith zero-value options (fully backward compatible)Motivation
Sites like The Verge use infinite scroll that loads additional full articles below the current one. When
Readability()extracts content, these extra articles pollute the result. This change lets callers specify selectors to remove before extraction, eliminating the need to reimplement the readability pipeline downstream.Closes #60
Test plan
TestReadabilityWithOptions_RemoveSelectors— verifies removed elements are excluded from extractionTestReadabilityWithOptions_NoSelectors— verifies empty options behave likeReadability()TestRemoveSelectors— unit test for the HTML cleaning functionTestRemoveSelectors_MultipleSelectors— verifies multiple selectors work together