refactor(docusaurus-plugin-content-blog): Replace `reading-time` npm with `Intl.Segmenter` API #11091

shreedharbhat98 · 2025-04-12T15:20:03Z

Pre-flight checklist

I have read the Contributing Guidelines on pull requests.
If this is a code change: I have written unit tests and/or added dogfooding pages to fully verify the new behavior.
If this is a new API or substantial change: the PR has an accompanying issue (closes Replace reading-time npm package by Intl.Segmenter API #11086) and the maintainers have approved on my working plan.

Motivation

Test Plan

Test links

Deploy preview: https://deploy-preview-_____--docusaurus-2.netlify.app/

Related issues/PRs

#11086

netlify · 2025-04-12T15:22:32Z

✅ [V2]

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`be88a80`
🔍 Latest deploy log	https://app.netlify.com/sites/docusaurus-2/deploys/67fa84a5764969000823f3af
😎 Deploy Preview	https://deploy-preview-11091--docusaurus-2.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

github-actions · 2025-04-12T15:25:34Z

⚡️ Lighthouse report for the deploy preview of this PR

URL	Performance	Accessibility	Best Practices	SEO	Report
/	🟠 67	🟢 98	🟢 100	🟢 100	Report
/docs/installation	🟠 51	🟢 97	🟢 100	🟢 100	Report
/docs/category/getting-started	🟠 72	🟢 100	🟢 100	🟠 86	Report
/blog	🟠 61	🟢 96	🟢 100	🟠 86	Report
/blog/preparing-your-site-for-docusaurus-v3	🔴 45	🟢 92	🟢 100	🟢 100	Report
/blog/tags/release	🟠 61	🟢 96	🟢 100	🟠 86	Report
/blog/tags	🟠 70	🟢 100	🟢 100	🟠 86	Report

shreedharbhat98 · 2025-04-12T15:29:11Z

Hi @slorber,

As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.

While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation.
Could you please advise on how you’d like to proceed in light of these differences?

Really appreciate your guidance—thank you!

shreedharbhat98 · 2025-04-16T13:16:41Z

@Josh-Cena & @slorber quick reminder

slorber

As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.

Thanks 👍

While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation. Could you please advise on how you’d like to proceed in light of these differences?

Can you make it so that we can easily see those differences between the review?

An idea would be to split this PR in 2:

first PR only writes unit tests for the original package, and refactor a bit the code (exact same behavior, so easy to review and merge for me)
second PR makes it easy to see the tests being different with the new implementation

I'll be unavailable in the next days so I'll only be able to review/merge later in 2 weeks.

👋

slorber · 2025-04-18T14:04:04Z

packages/docusaurus-plugin-content-blog/src/readingTime.ts

+  const segmenter = new Intl.Segmenter(locale, {granularity: 'word'});
+  const segments = segmenter.segment(contentWithoutFrontmatter);
+
+  let wordCount = 0;
+  for (const segment of segments) {
+    if (segment.isWordLike) {
+      wordCount += 1;
+    }
+  }


Could you extract this as a "countWords" function that we can unit test independently?

slorber · 2025-04-18T14:06:14Z

packages/docusaurus-plugin-content-blog/src/readingTime.ts

+interface ReadingTimeResult {
+  text: string;
+  minutes: number;
+  time: number;
+  words: number;
+}


We only need the number of minutes as an output

slorber · 2025-04-18T14:06:34Z

packages/docusaurus-plugin-content-blog/src/readingTime.ts

+ */
+interface ReadingTimeOptions {
+  wordsPerMinute?: number;
+  locale?: string;


The locale should always be provided, a Docusaurus site always has one

slorber · 2025-04-18T14:08:08Z

packages/docusaurus-plugin-content-blog/src/readingTime.ts

+): ReadingTimeResult {
+  const wordsPerMinute = options.wordsPerMinute ?? DEFAULT_WORDS_PER_MINUTE;
+  const locale = options.locale ?? DEFAULT_LOCALE;
+  const contentWithoutFrontmatter = content.replace(/^---[\s\S]*?---\n/, '');


We didn't have that before so I'd prefer to not do that.

The called should be responsible from providing text content, and this function shouldn't assume it's called in a markdown/mdx context

shreedharbhat98 · 2025-04-18T14:24:27Z

As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.

Thanks 👍

While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation. Could you please advise on how you’d like to proceed in light of these differences?

Can you make it so that we can easily see those differences between the review?

An idea would be to split this PR in 2:

first PR only writes unit tests for the original package, and refactor a bit the code (exact same behavior, so easy to review and merge for me)

second PR makes it easy to see the tests being different with the new implementation

I'll be unavailable in the next days so I'll only be able to review/merge later in 2 weeks.

👋

Thanks for the suggestions, @slorber. I will work on them.

Josh-Cena · 2025-04-18T16:41:58Z

For some context: I'm co-maintaining reading-time, and it's indeed extremely unclear what the value the project offers with Intl.Segmenter. One major difference is that reading-time splits CJK languages by characters instead of words, so you may get a smaller reading time estimate when using Intl.Segmenter, which arguably is more correct. So I'm +1 on this change.

Replaced readingTime npm with Intl.Segmenter

be88a80

shreedharbhat98 requested review from slorber and Josh-Cena as code owners April 12, 2025 15:20

facebook-github-bot added the CLA Signed Signed Facebook CLA label Apr 12, 2025

shreedharbhat98 changed the title ~~refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter~~ refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter API Apr 12, 2025

shreedharbhat98 mentioned this pull request Apr 12, 2025

Replace reading-time npm package by Intl.Segmenter API #11086

Open

2 tasks

slorber requested changes Apr 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(docusaurus-plugin-content-blog): Replace `reading-time` npm with `Intl.Segmenter` API #11091

refactor(docusaurus-plugin-content-blog): Replace `reading-time` npm with `Intl.Segmenter` API #11091

shreedharbhat98 commented Apr 12, 2025 •

edited

Loading

netlify bot commented Apr 12, 2025

github-actions bot commented Apr 12, 2025

shreedharbhat98 commented Apr 12, 2025

shreedharbhat98 commented Apr 16, 2025

slorber left a comment

slorber Apr 18, 2025

slorber Apr 18, 2025

slorber Apr 18, 2025

slorber Apr 18, 2025

shreedharbhat98 commented Apr 18, 2025

Josh-Cena commented Apr 18, 2025

refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter API #11091

Are you sure you want to change the base?

refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter API #11091

Conversation

shreedharbhat98 commented Apr 12, 2025 • edited Loading

Pre-flight checklist

Motivation

Test Plan

Test links

Related issues/PRs

netlify bot commented Apr 12, 2025

✅ [V2]

github-actions bot commented Apr 12, 2025

⚡️ Lighthouse report for the deploy preview of this PR

shreedharbhat98 commented Apr 12, 2025

shreedharbhat98 commented Apr 16, 2025

slorber left a comment

Choose a reason for hiding this comment

slorber Apr 18, 2025

Choose a reason for hiding this comment

slorber Apr 18, 2025

Choose a reason for hiding this comment

slorber Apr 18, 2025

Choose a reason for hiding this comment

slorber Apr 18, 2025

Choose a reason for hiding this comment

shreedharbhat98 commented Apr 18, 2025

Josh-Cena commented Apr 18, 2025

refactor(docusaurus-plugin-content-blog): Replace `reading-time` npm with `Intl.Segmenter` API #11091

refactor(docusaurus-plugin-content-blog): Replace `reading-time` npm with `Intl.Segmenter` API #11091

shreedharbhat98 commented Apr 12, 2025 •

edited

Loading