-
-
Notifications
You must be signed in to change notification settings - Fork 9k
refactor(docusaurus-plugin-content-blog): Replace reading-time
npm with Intl.Segmenter
API
#11091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor(docusaurus-plugin-content-blog): Replace reading-time
npm with Intl.Segmenter
API
#11091
Conversation
reading-time
npm with Intl.Segmenter
reading-time
npm with Intl.Segmenter
API
✅ [V2]Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
⚡️ Lighthouse report for the deploy preview of this PR
|
Hi @slorber, As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API. While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation. Really appreciate your guidance—thank you! |
@Josh-Cena & @slorber quick reminder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.
Thanks 👍
While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation. Could you please advise on how you’d like to proceed in light of these differences?
Can you make it so that we can easily see those differences between the review?
An idea would be to split this PR in 2:
- first PR only writes unit tests for the original package, and refactor a bit the code (exact same behavior, so easy to review and merge for me)
- second PR makes it easy to see the tests being different with the new implementation
I'll be unavailable in the next days so I'll only be able to review/merge later in 2 weeks.
👋
const segmenter = new Intl.Segmenter(locale, {granularity: 'word'}); | ||
const segments = segmenter.segment(contentWithoutFrontmatter); | ||
|
||
let wordCount = 0; | ||
for (const segment of segments) { | ||
if (segment.isWordLike) { | ||
wordCount += 1; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you extract this as a "countWords" function that we can unit test independently?
interface ReadingTimeResult { | ||
text: string; | ||
minutes: number; | ||
time: number; | ||
words: number; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need the number of minutes as an output
*/ | ||
interface ReadingTimeOptions { | ||
wordsPerMinute?: number; | ||
locale?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The locale should always be provided, a Docusaurus site always has one
): ReadingTimeResult { | ||
const wordsPerMinute = options.wordsPerMinute ?? DEFAULT_WORDS_PER_MINUTE; | ||
const locale = options.locale ?? DEFAULT_LOCALE; | ||
const contentWithoutFrontmatter = content.replace(/^---[\s\S]*?---\n/, ''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't have that before so I'd prefer to not do that.
The called should be responsible from providing text content, and this function shouldn't assume it's called in a markdown/mdx context
Thanks for the suggestions, @slorber. I will work on them. |
For some context: I'm co-maintaining reading-time, and it's indeed extremely unclear what the value the project offers with Intl.Segmenter. One major difference is that reading-time splits CJK languages by characters instead of words, so you may get a smaller reading time estimate when using Intl.Segmenter, which arguably is more correct. So I'm +1 on this change. |
Pre-flight checklist
reading-time
npm package byIntl.Segmenter
API #11086) and the maintainers have approved on my working plan.Motivation
Test Plan
Test links
Deploy preview: https://deploy-preview-_____--docusaurus-2.netlify.app/
Related issues/PRs
#11086