Download the entire site into one CSV with Scraping Camel

Lukáš Horák
14. 4. 2021
4 minutes read
Download the entire site into one CSV with Scraping Camel

Do you want to get data from web pages or online stores that are not contained in the XML feed? You can easily access valuable information with the new Scraping Camel app. Use its functions for more efficient creation of PPC ads or SEO. We’ll show you how.

Do you want to get data from web pages or online stores that are not contained in the XML feed? You can easily access valuable information with the new Scraping Camel app. Use its functions for more efficient creation of PPC ads or SEO. We’ll show you how.

Keep all the necessary information in one file

Scraping Camel is developed by Shopitak, which focuses on developing applications for the Mergado ecosystem. The app goes through the HTML pages of the website and obtains any information from them. The app saves it and generates one output CSV file. Thanks to this, the app is suitable for high-quality data analysis of products and categories.

What data can you get from the site? Using the app, you will receive any information from the website, such as Title, Meta Description, headings H1 and H2, Google Analytics tag ID, or Google Tag Manager.

The application can also process websites that are not online stores. These are, for example, various catalogs (fashion, travel tickets, etc.) or web presentations. It can edit the data in Mergado for PPC advertising on Google Ads, and it can further process the usual store procedures. If the user’s shop system does not generate XML (or other) feeds, it can obtain the necessary information and further work with them in Mergado.

With Scraping Camel, you apply feed marketing workflows from online stores with an XML feed to websites without a cart. Data is continuously automated. Outputs are available online for other applications or data connections.

How Scraping Camel works

  1. Define the domain that the app should crawl.
  2. Verify it. It is similar to Google. You can choose from embedding the file on the web, META tags in pages, or a DNS record. The goal is to prove that this is not a third-party website.
  3. Insert sitemap.xml, which is a condition for the app to work. Scraping Camel takes the URL of the website from here.
  4. Then set the frequency of web crawling. Too many queries can overload the web and slow down the processing of the whole web.
  5. Next, choose which elements you want to retrieve from the target HTML pages. The defaults are title, meta description, or define own elements (via a regular expression or by placing text before and after the information you are looking for).
  6. Set how the elements with the obtained information should be named in the output CSV.
  7. Finally, the app starts crawling the destination site. When it is processed in its entirety, the app will generate an output CSV and state its address in the administration.

How to set up Scraping Camel step by step? You will find a detailed method in this documentation.

How to use Scraping Camel?

At the testing store, we will show you how easy it is to get SEO data and a product description.

    • This keyboard shortcut allows you to see the source code of the site you need to define the elements from. Or you can right-click to view the source code of the page.
    • Use the CTRL + F keyboard shortcut (to search for content on the page) to enter the element you want to get. In this case, we want to find the product description, i. e.: <h3> Detailed description of the product </​h3>.

  • In “Values before” enter: <h3> Detailed product description </​h3> and in “Values below” enter </​div>. It will look like this:

  • The application is not primarily used to view data. We recommend doing it in another program, such as Mergado or Google Sheets. Apply the same procedure to other elements that you want to get from the site.

Scraping Camel regularly and automatically checks the destination site. If it finds a new page, it will process it immediately and project any changes in the output CSV file.

The app can be used not only by online store operators. Marketers, specialists in SEO or PPC advertising can also load product data or services from a page without a feed into the CSV file.

What are the differences between the application and other tools? Programs such as Screaming Frog or Xenu work on a one-time basis and run on a local device. Scraping Camel works the opposite — it runs on a non-stop server. It provides outputs in machine-readable form, which you can further process. You may use it for one-time analyzes, where the data is automatically processed by other software.

Summary

Benefits of Scraping Camel:

  • continuous monitoring of changes
  • works on the server (non-stop)
  • possibility to upload to Mergado as an input file for export and work with it in the usual way
  • unlimited number of sites per account

What you should know:

  • the app does not render JavaScript, it only works on HTML
  • the principle of data extraction is based on characters, not on elements
  • the condition for using Scraping Camel is a functional sitemap file and a verified domain

Try the Scraping Camel features for 30 free days and gain the benefits of quality data.

Read more:

Lukáš Horák

Lukáš takes care of most of the Czech and English communication in Mergado. Through blogs, e‑mail, and social networks, he regularly supplies readers with e‑commerce news and news and tips from Mergado. In his time off, he enjoys simple things like badminton, digging the hidden gems of the 80’s, and seafood served with red wine.