2 years ago

#29367

test-img

jeremy

Why do i see escape sequence characters in the parse output for cheerio?

Attached below is my code:-

app.get("/techcrunch", (req, res) => {
    axios("https://techcrunch.com/")
      .then((response) => {
        const html = response.data;
        const $ = cheerio.load(html, {decodeEntities: false });
        const newsItems = [];
  
        $("h2.post-block__title").each(function () {
          // const title = $(this).text()
          const baseElement = $(this);
          const title = baseElement.text();
          const url = baseElement.find("a").attr("href");
          newsItems.push({ title, url});
        });
        res.send(newsItems);
  
      })
      .catch((err) => console.log(err));
  });

Over here I'm trying to parse the page source of TechCrunch and extract the text stored in "h2.post-block__title" but weirdly in the resultant string I see escape sequence characters like "\n" , "\t" and so on as seen below:-

 {
    "title": "\n\t\t\t\n\t\t\t\tThis Week in Apps: Instagram brings back the chronological feed, South Korea bans P2E games, Google looks for ecosystem integrations\t\t\t\n\t\t",
    "url": "https://www.bbc.comhttps://techcrunch.com/2022/01/08/this-week-in-apps-instagram-brings-back-the-chronological-feed-south-korea-bans-p2e-games-google-looks-for-ecosystem-integrations/"
  },

I tried passing {decodeEntities: false} as seen above but it still does not return it.

One way to solve it, i thought would be to run unescape() on the title string returned by cheerio but unfortunately unescape() is deprecated i guess.

Any idea on what i could do? Thanks in advance!!

javascript

web-scraping

cheerio

0 Answers

Your Answer

Accepted video resources