Sometimes data is not wrapped up in a PDF or XLS file, but there is a need to scrap this information from a URL and store it in a spreadsheet.
In this post, I will explain how to extract data from websites using Import.io. This source allows the user to download data as spreadsheets, raw JSON or directly into the system via their API.
As an example, I will use this URL about the UN Peacekeeping contribution by country. The data is displayed in an interactive table and can be sorted by different categories:
Step 1. Go to import.io and paste the URL
Find the website and paste the URL to scrap on the white box:
Step 2. Click the ‘Try it out’ button
Step 3. Download the data as a CSV file…
The browser will open a new window with all the readable data from this website. This information can be downloaded as a CSV file and be manipulated with Excel:
However, import.io also has the option to save these figures as an API and use them more than once in a future.
Step 4. … or save the API
Click the ‘Save API’ button and a next screen will be displayed with some features to be adjusted. Data can be used through a single URL, a Bulk Extract, or URLs from another API. For this example, we need the first option:
After running the query, this data can also be downloaded as a JSON format:
Step 5. Export the data to spreadsheets or create a graph
This information can be exported to Google Sheets (the website will ask you to link your Gmail account), use it in Data Set, create a Plot.ly graph or an API integration:
Clicking on the API integration the source will display a code about the parameters that you just set:
This raw material can also be quickly visualised on Plot.ly. A new window will be opened with the data and the possibility to adjust some features (X axis, Y axis, type of chart…) before displaying the figures in a graph:
Import.io is an easy source to use and gather data from a website without coding or downloading any software.
Do you have more examples? Let me know in the comments