2 years ago
#29367
jeremy
Why do i see escape sequence characters in the parse output for cheerio?
Attached below is my code:-
app.get("/techcrunch", (req, res) => {
axios("https://techcrunch.com/")
.then((response) => {
const html = response.data;
const $ = cheerio.load(html, {decodeEntities: false });
const newsItems = [];
$("h2.post-block__title").each(function () {
// const title = $(this).text()
const baseElement = $(this);
const title = baseElement.text();
const url = baseElement.find("a").attr("href");
newsItems.push({ title, url});
});
res.send(newsItems);
})
.catch((err) => console.log(err));
});
Over here I'm trying to parse the page source of TechCrunch and extract the text stored in "h2.post-block__title" but weirdly in the resultant string I see escape sequence characters like "\n" , "\t" and so on as seen below:-
{
"title": "\n\t\t\t\n\t\t\t\tThis Week in Apps: Instagram brings back the chronological feed, South Korea bans P2E games, Google looks for ecosystem integrations\t\t\t\n\t\t",
"url": "https://www.bbc.comhttps://techcrunch.com/2022/01/08/this-week-in-apps-instagram-brings-back-the-chronological-feed-south-korea-bans-p2e-games-google-looks-for-ecosystem-integrations/"
},
I tried passing {decodeEntities: false} as seen above but it still does not return it.
One way to solve it, i thought would be to run unescape() on the title string returned by cheerio but unfortunately unescape() is deprecated i guess.
Any idea on what i could do? Thanks in advance!!
javascript
web-scraping
cheerio
0 Answers
Your Answer