Web crawling and web scraping

ECOPER has incorporated two new types of artificial intelligence tool into its analysis and evaluation work used for big data analysis: web crawling and web scraping.

Web crawling is a process by which a ‘robot’ (in this case, an algorithm) called a crawler systematically navigates through the internet. The crawler is provided with an initial set of URLs, known as seeds, and downloads the web pages associated with the seeds to search, in turn, for other URLs within them. These new URLs are added to the list of links that the crawler must visit and the process continues until a maximum level of iterations established by the programmer is reached.

In parallel, the process of web scraping extracts relevant information from these pages. As explained in a previous post, ECOPER has a general dictionary used for monitoring how the 2030 Agenda has permeated general policy. Web scraping allows for key terms on the Agenda to be searched for in the pages collected by the crawler.