overleaf-cep/services/web/scripts/learn/checkSanitize
Liangjun Song 2f87db9c0d Merge pull request #24790 from overleaf/ls-use-script-runner
Update some scripts to use Script Runner

GitOrigin-RevId: aaa11f94dcfd328c158bb02d1b9fb2adfb1bb146
2025-05-23 08:05:23 +00:00
..
checkSanitizeOptions.mjs [web] Add prefer-node-protocol ESLint rule (#21523) 2024-11-05 09:04:33 +00:00
index.mjs Merge pull request #24790 from overleaf/ls-use-script-runner 2025-05-23 08:05:23 +00:00
README.md Merge pull request #21282 from overleaf/ls-scripts-to-esm-5 2024-10-25 08:05:41 +00:00
scrape.mjs [web] Add prefer-node-protocol ESLint rule (#21523) 2024-11-05 09:04:33 +00:00

Usage

node scripts/learn/checkSanitize/index.mjs https://LEARN_WIKI

Bulk export

There is a bulk export for media wiki pages, but it produces different html escaping compared to the regular parse API we use in web.

The bulk export does not escape all the placeholder HTML-like elements, like <project-id or <document goes here>.

Example output

Here is how a missing tag gets flagged:

---
page           : MediaWiki markup for the Overleaf support team
title          : MediaWiki markup for the Overleaf support team
match          : false
toText         : false
text           : "Overleaf</strong></td>\n            </tr>\n           <tr><td>Kb/<strong>TITLE_SLUG</strong></td><td><nowiki>https://www.overleaf.com/learn/how-to/</nowiki><strong>TITLE_SLUG</strong></td>\n           </"
sanitized      : "Overleaf</strong></td>\n            </tr>\n           <tr><td>Kb/<strong>TITLE_SLUG</strong></td><td>&lt;nowiki&gt;https://www.overleaf.com/learn/how-to/&lt;/nowiki&gt;<strong>TITLE_SLUG</strong></td>\n "
textToText     : "    \n        \n        \n            \n                MediaWiki page\n                Maps to on Overleaf\n            \n           Kb/TITLE_SLUGhttps://www.overleaf.com/learn/how-to/TITLE_SLUG\n           "
sanitizedToText: "    \n        \n        \n            \n                MediaWiki page\n                Maps to on Overleaf\n            \n           Kb/TITLE_SLUG<nowiki>https://www.overleaf.com/learn/how-to/</nowiki>TITLE"

Note the hidden/escaped <nowiki> element. In addition to the side-by-side comparison of HTML you will see a plain-text diff.