data, data journalism, data visualization

How to: extract data from a website using Import.io

output_sE1fEB

Sometimes data is not wrapped up in a PDF or XLS file, but there is a need to scrap this information from a URL and store it in a spreadsheet.

In this post, I will explain how to extract data from websites using Import.io. This source allows the user to download data as spreadsheets, raw JSON or directly into the system via their API.

As an example, I will use this URL about the UN Peacekeeping contribution by country. The data is displayed in an interactive table and can be sorted by different categories:

Screen Shot 2016-02-29 at 21.14.37

Step 1. Go to import.io and paste the URL

Find the website and paste the URL to scrap on the white box:

Screen Shot 2016-02-29 at 21.31.23

Step 2. Click the ‘Try it out’ button

Step 3. Download the data as a CSV file…

The browser will open a new window with all the readable data from this website. This information can be downloaded as a CSV file and be manipulated with Excel:

Screen Shot 2016-02-29 at 21.34.21

However, import.io also has the option to save these figures as an API and use them more than once in a future.

Step 4. … or save the API

Click the ‘Save API’ button and a next screen will be displayed with some features to be adjusted. Data can be used through a single URL, a Bulk Extract, or URLs from another API. For this example, we need the first option:

Screen Shot 2016-02-29 at 21.42.11

After running the query, this data can also be downloaded as a JSON format:

Screen Shot 2016-02-29 at 21.46.09.png

Step 5. Export the data to spreadsheets or create a graph

This information can be exported to Google Sheets (the website will ask you to link your Gmail account), use it in Data Set, create a Plot.ly graph or an API integration:

Screen Shot 2016-02-29 at 21.46.18Clicking on the API integration the source will display a code about the parameters that you just set:

Screen Shot 2016-02-29 at 21.51.30

This raw material can also be quickly visualised on Plot.ly. A new window will be opened with the data and the possibility to adjust some features (X axis, Y axis, type of chart…) before displaying the figures in a graph:

Screen Shot 2016-02-29 at 21.51.13

 

Import.io is an easy source to use and gather data from a website without coding or downloading any software.

Do you have more examples? Let me know in the comments

Advertisements
Standard

One thought on “How to: extract data from a website using Import.io

  1. Pingback: [AUDIO MAP] My summer through Instagram pictures and Spotify songs | dinfografia

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s