automation, journalism, machine learning

Inject: How machine learning can help journalists to discover new angles of a story

These last years, we’ve seen how machine learning is helping journalists with their day-to-day tasks such as finding stories, doing photography work or fact-checking. These are algorithms that enable computers to learn patterns from data (Diakopoulos, 2019) and speed up the process of publishing.

However,  we are starting to see initiatives that aim to help journalists on the ideation and creative steps. Could a tool that uses machine learning techniques generate new angles on existing or new stories?

Inject, a collaboration tool between journalists in Europe, aims to help journalists by offering different points of view, concepts or angles on the interested topics so that it can be represented in a different way.

Inject is defined by their team on this paper (2018) as an idea generator, a suggestion machine and an inspiration tool rather than a search engine. Integrated with natural language processing, creativity search algorithms and interactive creative sparks, the goal is to provide digital creative support on the main used editors.

Inject’s idea generation support

Claus Hesseling, German journalist that has been conducting several hands-on workshops on creative strategies in journalism, is a member of Inject and shares the below about the creative side of this tool:

“Inject helps you to pop this filter bubble. When you search, you only do it for things you know, within your knowledge. Inject helps to bring new ideas”

The discovery phase takes time – “time that journalists increasingly lack as news and media organisations are squeezed by reducing circulations, revenues and staff numbers” (Making the News: Digital Creativity Support for Journalists, 2018). Hesseling shares that the goal of Inject is to help with this lack of time:

Some people might see on Inject a similar approach as Google News. However, Hesseling highlights that the main difference is that:

“Google tries to give you exactly what you want, so you go directly to it. With Inject you get randomise or serendipity, something that fits for you and that you didn’t know before”.

Inject’s user interaction layer: How does it work?

To support use by journalists, Inject was implemented as an add-on sidebar for the Google Docs text editor. However, it can also be used on WordPress, Google Chrome extension or Adobe InCopy. There is also a simpler version as a web application, but it can be less helpful if you’re already used to a text editor for your stories. Ad-hoc customisations also allow newsrooms to plug Inject to their own archive to retrieve data from this source.

The first step is to decide where you’d like to use it, install it and open it. This will depend on the editor that you’re using. For this post, I’ll use Google Docs, so I’d go as per the below:

screenshot 2019-01-19 at 11.17.51

I can use INJECT in four different ways to get more creative content for my story:

  • Quantitative evidences that are relevant for my story, examining the selected keywords (Backing and Evidence)

screenshot 2019-01-19 at 11.26.47

  • People or entities that play a key role on my topic term (Individuals)

webp.net-gifmaker

  • Causal: Information about events associated wit the background of the news story related to my topic term

webp.net-gifmaker (1)

  • Quirky: Examine comical information (cartoons) associated with the topic

webp.net-gifmaker (2)

 

 

 

 

 

 

 

Coming soon features are:

  • Ramifications: information associated with future consequences of the story
  • Data visualisations: data sets and visualisations that are related to the story

Inject’s data layer: Where is the information coming from? 

For the first release, they added and tagged 1.6 mission news stories that were discovered using RSS feeds from 150 sources.

The team curated these sources by hand to have quality sources and represent political perspectives and “reduce the risk of echo chambers”. As a result, they added a database of over 40,000 political cartoons. They are constantly adding new sources and next step would be to add a filter for the upcoming ones.

Inject’s news extraction: machine learning algorithms

In order to collate and index these 150 news sources, the team followed the below process:

  1. Web crawling
  2. Manually curate sources
  3. Apply Natural Language Processing (NLP) engine through Entities extraction to detect people, events, places and organisations. For Inject particularly, they have used two Named Entity Recognition services: DBpedia Spotlight and Polyglot.

Before explaining how these algorithms can be applied, we would understand an algorithm as:

“A series of steps undertaken in order to solve a particular problem or to accomplish a defined outcome” (Diakopoulos, 2019)

Using the above definition, Inject’s particular problem was to be able to carefully extract information from 150 sources in order to properly present this data to journalists when using keywords (defined outcome).

Once they identified the problem, the team used different trained Natural Language Processing models that identify the below:

  • Detect and extract people, location organisation and event entities
  • Determine noun and verb phrases through advanced natural language parsing

This algorithm uploaded the information from the feeds every 30 minutes and stored it in a database as metadata with raw article data test as strings (text) and a URL link to the source.

Once Inject’s tool was ready, they tested it with Norwegian and UK newsrooms. At run-time, the tool would perform the below:

  1. Use context knowledge from other terms in the query (football, match, score rather than economics, crisis or dollars) and look for terms that have similar meaning (score, goal or touchdown).
  2. Return unordered set of news articles or cartoons that achieved a threshold match score (which is the likelihood that a match obtained is accurate).

Challenge: Journalists our of their comfortable zone

Inject makes archive smart and brings these sources to journalists so that they can use this into their news article.

However, their biggest challenge is to apply this creative and thinking process within newsrooms where time is limited and there is skepticism on Artificial Intelligence.

What inject is trying to do in an algorithmic and digital way, as Hesseling shares, is to break habits and introduce new tools that allow journalists to get out of the comfortable zone and come up with new angles.

Inject’s context: London’s City University 

Raised by some professionals at the City University of London under the Digital Creativity Master, this tool got initial funding from Google’s Digital News Innovation (DNI) to build a prototype.

In January two years ago, the project was presented to the European Commission under the program Horizon 2020, which aims to foster innovation within the European Union. After getting funding from this program, 9 partners from different European countries (Universities in London, Netherlands, small newspapers in Norway amongst other organisations) joined the project.

Currently, they are on the process to start a non-profit company. Goal of inject is to keep a sustainable business to develop further. Charge fees (server, third party APIs, etc).

 


If you want to know more about…  

As a plus, for the ones that are interested on knowing more about Named-Entity Recognition: Why are we using machine learning and where is the difficulty around that?

The ability to identify and extract a list of terms from a huge dataset is not easy. From a sentence written by a journalist, Inject needed to recognise that some words aren’t specific or unique to a name, but that are rather part of an entity gathering information that hasn’t probably been shared before.

Doing that manually requires a lot of work and understanding of the context. Since it’s a very complex task with endless steps to do, it seems to be a perfect case for applying machine learning.

There are several methods that can be used, from the most old ones such as Conditional Random Field (CRF) to the current state of the art which is Deep Learning. Neural networks classifiers capture not only the syntactical structure of the sentence but also the semantic one.

You can read more about this on this paper (A Survey on Recent Advances in Named Entity Recognition from Deep Learning models) or on this blog post (Named Entity Recognition)

 

Any questions or feedback? Let me know in the comments or at @mcrosasb

 

 

Advertisements
Standard

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s