![]() If the web page doesn’t load completely, Octoparse may have problems in scraping data or executing the next step in the workflow. Otherwise, Octoparse will stop reloading the web page by the AJAX timeout that you set up, which may result in incomplete page loading. When you scrape from a web page that needs reloading after the update, please don’t set up AJAX timeout. Usually, we recommend you to select 2-4 seconds.ģ) Do not set up AJAX timeout when there’s no AJAX. To set up AJAX timeout, you can go to "AJAX Load" and select "Load the page with AJAX" in Customize Action.Īfter ticking the "Load the page with AJAX" box, you can set up AJAX timeout. If you find there's no reloading sign like after clicking on an element to update the page, then you can be sure that the element is using AJAX. This event is not currently available online. In this case, Octoparse has no need to wait for the reloading signal to act.Ģ) When and how to set up AJAX timeout in Octoparse?īecause websites usually apply AJAX technique on elements that require being clicked, such as "Load more" and "View reviews", AJAX timeout set up is very necessary for steps like "Click to paginate" or "Click item".įirst, we need to identify whether there’s AJAX or not. Sponsored by HML Recycling Supported by Hyndburn Borough Council. For example, if you set up 2 second AJAX timeout for "Click to paginate" action, Octoparse will wait for 2 seconds, and then execute the action. So when you want to scrape data from a web page using AJAX, you need to set up AJAX timeout to avoid Octoparse from being stuck. Q: Why it takes so long to extract the next page information when extracting websites with pagination The websites extracted may use AJAX and Octoparse may not. As a result, we may get zero, or much fewer extracted data than we expect. As there is no reloading, Octoparse doesn't receive the signal to act and would be stuck in the last step. For the web page using AJAX, it updates new contents without reloading. While scraping data from the web, Octoparse takes the reloading as the signal to execute the action, such as "Click item" and "Click to paginate". When you update a web page that applies AJAX by clicking, no reloading sign like will be displayed.ġ) Why do I need to deal with AJAX when using Octoparse? ![]() It is a set of web development techniques that allows a web page to update portions of contents without having to refresh the page. In this tutorial, you will learn how to deal with AJAX with Octoparse in data scraping.ĪJAX stands for Asynchronous JavaScript and XML. Almost all the websites do not provide users with the functionality to save a copy of the data displayed on the web. The latest version for this tutorial is available here.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |