Skip to content

refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter API #11091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shreedharbhat98
Copy link

@shreedharbhat98 shreedharbhat98 commented Apr 12, 2025

Pre-flight checklist

Motivation

Test Plan

Test links

Deploy preview: https://deploy-preview-_____--docusaurus-2.netlify.app/

Related issues/PRs

#11086

@facebook-github-bot facebook-github-bot added the CLA Signed Signed Facebook CLA label Apr 12, 2025
@shreedharbhat98 shreedharbhat98 changed the title refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter API Apr 12, 2025
Copy link

netlify bot commented Apr 12, 2025

[V2]

Built without sensitive environment variables

Name Link
🔨 Latest commit be88a80
🔍 Latest deploy log https://app.netlify.com/sites/docusaurus-2/deploys/67fa84a5764969000823f3af
😎 Deploy Preview https://deploy-preview-11091--docusaurus-2.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

⚡️ Lighthouse report for the deploy preview of this PR

URL Performance Accessibility Best Practices SEO Report
/ 🟠 67 🟢 98 🟢 100 🟢 100 Report
/docs/installation 🟠 51 🟢 97 🟢 100 🟢 100 Report
/docs/category/getting-started 🟠 72 🟢 100 🟢 100 🟠 86 Report
/blog 🟠 61 🟢 96 🟢 100 🟠 86 Report
/blog/preparing-your-site-for-docusaurus-v3 🔴 45 🟢 92 🟢 100 🟢 100 Report
/blog/tags/release 🟠 61 🟢 96 🟢 100 🟠 86 Report
/blog/tags 🟠 70 🟢 100 🟢 100 🟠 86 Report

@shreedharbhat98
Copy link
Author

Hi @slorber,

As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.

While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation.
Could you please advise on how you’d like to proceed in light of these differences?

Really appreciate your guidance—thank you!

@shreedharbhat98
Copy link
Author

@Josh-Cena & @slorber quick reminder

Copy link
Collaborator

@slorber slorber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.

Thanks 👍

While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation. Could you please advise on how you’d like to proceed in light of these differences?

Can you make it so that we can easily see those differences between the review?

An idea would be to split this PR in 2:

  • first PR only writes unit tests for the original package, and refactor a bit the code (exact same behavior, so easy to review and merge for me)
  • second PR makes it easy to see the tests being different with the new implementation

I'll be unavailable in the next days so I'll only be able to review/merge later in 2 weeks.

👋

Comment on lines +30 to +38
const segmenter = new Intl.Segmenter(locale, {granularity: 'word'});
const segments = segmenter.segment(contentWithoutFrontmatter);

let wordCount = 0;
for (const segment of segments) {
if (segment.isWordLike) {
wordCount += 1;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you extract this as a "countWords" function that we can unit test independently?

Comment on lines +12 to +17
interface ReadingTimeResult {
text: string;
minutes: number;
time: number;
words: number;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need the number of minutes as an output

*/
interface ReadingTimeOptions {
wordsPerMinute?: number;
locale?: string;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The locale should always be provided, a Docusaurus site always has one

): ReadingTimeResult {
const wordsPerMinute = options.wordsPerMinute ?? DEFAULT_WORDS_PER_MINUTE;
const locale = options.locale ?? DEFAULT_LOCALE;
const contentWithoutFrontmatter = content.replace(/^---[\s\S]*?---\n/, '');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't have that before so I'd prefer to not do that.

The called should be responsible from providing text content, and this function shouldn't assume it's called in a markdown/mdx context

@shreedharbhat98
Copy link
Author

As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.

Thanks 👍

While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation. Could you please advise on how you’d like to proceed in light of these differences?

Can you make it so that we can easily see those differences between the review?

An idea would be to split this PR in 2:

  • first PR only writes unit tests for the original package, and refactor a bit the code (exact same behavior, so easy to review and merge for me)
  • second PR makes it easy to see the tests being different with the new implementation

I'll be unavailable in the next days so I'll only be able to review/merge later in 2 weeks.

👋

Thanks for the suggestions, @slorber. I will work on them.

@Josh-Cena
Copy link
Collaborator

For some context: I'm co-maintaining reading-time, and it's indeed extremely unclear what the value the project offers with Intl.Segmenter. One major difference is that reading-time splits CJK languages by characters instead of words, so you may get a smaller reading time estimate when using Intl.Segmenter, which arguably is more correct. So I'm +1 on this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Signed Facebook CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace reading-time npm package by Intl.Segmenter API
5 participants