When a task is built manually or if Octoparse fails to detect AJAX, it is also possible to set it up manually by clicking the "Click Item" action or the "Click to Paginate" action. If you need a longer or shorter timeout, simply click on the dropdown menu and choose the one you'd like. So when we choose to click the next page button, Octoparse automatically sets up AJAX timeout for the action. Octoparse would set up AJAX timeout automatically when AJAX is detected for the page.įor example, Walmart's website uses AJAX to load the next page. There are two ways AJAX can be taken care of in Octoparse. So we need to set up an AJAX timeout for the "Click Item" or "Click to Paginate" to tell Octoparse to go to the next action when the timeout is reached. If the page reloads after clicking an element, it will execute the next action after the reload finishes.īut as pages with AJAX do not reload, Octoparse doesn't receive the signal to act and would get stuck. Octoparse uses reloading as a signal when executing the clicked item. If there's no AJAX involved, you should see the page reloads with the reloading icon running when you click to load more information.So there should be NO reloading sign in this case When there's AJAX involved, the page should not reload when additional content gets loaded.Hence, the reloading icon is a good indicator to tell apart if AJAX's been used. When AJAX is used, the web page loads the additional content without reloading the page. When you have a click action to load web data, it is rather straightforward to tell if AJAX is being used. How do I know if a web page loads content using AJAX? When AJAX is used, only part of the page gets updated when you hit buttons like the "next page" button, or "show more" on the web page. What Is AJAX?ĪJAX stands for "Asynchronous JavaScript and XML", which allows a web page to update information without reloading the entire page, and request/receive data after the page's loaded. In this article, I will show you how to handle AJAX in Octoparse. Octoparse can easily deal with pages with AJAX. Cloud Extraction doesn’t support dealing with Captcha. If you encounter a captcha, you can manually input the captcha when running the task locally. Entering captcha manually while running local extraction Any login information saved will be removed from your account permanently as soon as the task is deleted.ģ.When a task is exported, the password saved in the task gets removed automatically. In Octoparse, when you enter your password, it is only accessible on your own account.To resolve this, you will need to go through the login steps once again under browser mode in order to obtain and save the updated cookie. In Octoparse, the saved cookie will no longer work when it gets expired. Some have a specific expiration time, others expire immediately as the browser is closed. A saved cookie is only effective before it gets expiredĬookies come in many different forms. Problem solved! Now you can move on to build your task workflow.ġ. Now the web page is supposed to "remember" the login and skip the login steps when the crawler is running next time.After login, go to the "Options" settings of the "Go to web page" action, tick "Use Cookie" and click "Use cookie from the current page".You can log in to the website just like what you do on a regular browser.Toggle on the browse mode on the top right.This way, Octoparse will send the saved cookies to the website during loading, and there's a good chance the website will remember "you" and skip the login steps. Most of the time, you can optimize the workflow by saving the cookies in the task after login. Click the "Log In" button and select "Click button" in the Tips panel.Repeat the above steps for the "Password" textbox.You'll see the username entered is automatically populated to the username textbox on the web page Input your username into the textbox, click "Confirm".Select "Enter text" from the Tips panel.Click on the username textbox(Email or phone number).Create a new task in Octoparse with a Facebook-related URL.Let's say we need to scrape data from Facebook. We can either tell Octoparse to input the login information (username and password) for us or log in ourselves in the browse mode and use cookies to optimize the workflow. Fortunately, it is still possible to access the data with Octoparse. Some websites require users to log in with an account and password to show content, which means our target data is behind authentication. Sharpen your skills and explore new ways to use Octoparse. For the latest tutorials, visit our new self-service portal.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |