Miles Wallio
kmw's thoughts and experiments

kmw's thoughts and experiments

TL;DR: Automated Summaries

Miles Wallio's photo
Miles Wallio
·Jun 27, 2022·

4 min read

Table of contents

  • Building TL;DR
  • Examples
  • Conclusion

Like being in the know but not spending all that time reading to be in the know? We need to know, but we know we need more time. How can we save time to know when we need to read to know? Ya know?

TL;DR: Automated Summaries can help.

The past few years have shown us staying knowledgeable is critical. However, with so many data sources, there's too much to consume. Reading headlines is not enough.

Presenting TL;DR: Automated Summaries. TL;DR finds what's relevant in an article and surfaces it up. TL;DR has a simple API to request the summaries.

Building TL;DR

I used NodeJS with natural, sylvester, gramma, expressJS, ejs, bootstrap, and LanguageTool. I built this for the Linode x Hashnode Hackathon, so it's running on Linode.

Natural is a Natural Language Processing library for nodeJS. Sylvester is a math library. ExpressJS and ejs provide the web framework and templating library. Gramma is an interface to Language Tool. Language Tool provides word replacements and sentence checks.

Building Summaries

When thinking of surfacing relevant information, PageRank is a popular algorithm. Using Natural's stemmers and TF-IDF, we convert documents into a transition matrix we can run PageRank on.

PageRank Visualized PageRank visualized: The size of each face is proportional to the total size of the other faces which are pointing to it. Art draw drawn by FML.

Important sentences rise up in score. We can grab the top scoring sentences for our summary.

Before doing all of this, we need to sanitize our inputs. Articles are commonly written in HTML or Markdown. TL;DR strips markdown and HTML. Language Tool helps remove fluffy words. Sites like Hashnode contain blog entries and technical posts. LanguageTool helps remove technical sentences that don't belong in summaries.

Language Tool slows down processing. The API allows for enabling or disabling Language Tool.

The Hardware

Linode was kind enough to provide $100 for Hackathon participants.

TL;DR runs on an Linode 8GB Shared CPU instance. This instance hosts TL;DR and the Language Tool server.

PM2 and Caddy run in front of TL;DR. PM2 handles monitoring, startup, and restarting TL;DR. Caddy acts as a proxy in front of PM2 and handles SSL certificates.

Going Forward

Ideally, we would have multiple Language Tool instances running behind a Node Balancer or in a Kubernetes cluster. TL;DR could distribute processing a document across these instances. For productizing TL;DR, a similar setup is recommended. If you change our Hashnode Example to enable Language Tool, you'll see rendering time increases. A distributed setup on Linode allows for the best summaries in near real time.

Examples

Homepage Tool

TL;DR's Homepage provides an interactive form for playing with the tool.

Hashnode Summarized

Hashnode TL;DR'd

Browse Hashnode Summarized. These summaries are using TL;DR without Language Tool. 3 sentences summarize each post.

With vs Without Language Tool

Using Language Tool for filters can improve our summaries. Let's look at What is Test Driven Development by Stephan E.G. Veenstra of fluttergamedev.com.

With Language Tool

Test Driven Development, or TDD, is a software development method where one writes the tests before the implementation. This article is the first in the Introduction to TDD series in which we will build the logic for the game Tic TAC Toe using TDD. We will do this by writing the minimal implementation to make the test pass, even if it means to return a hard-coded value.

Without Language Tool

Test Driven Development, or TDD, is a software development method where one writes the tests before the implementation. By not thinking ahead you will really be guided by TDD. Writing tests in Dart could not have been easier.

With Language Tool, we have a little more context. Both grab the definition of TDD, but without Language Tool, the sentences are disconnected. With Language Tool, we know the article covers building Tic Tac Toe and making tests pass.

The Hackathon Summarized

Summary of Hashnode x Linode Hackathon

The Build with Linode Hackathon summarized in 5 sentences becomes:

Build an interesting open-source project using Linode and win up to 1000 USD and cool swag. We're super excited to announce the Linode Hackathon — Build, deploy, and scale your application easily and cost-effectively in the cloud with Linode. Linode products, services, and people enable developers and businesses to build, deploy, and scale applications more easily and cost-effectively in the cloud. Build an exciting open-source app of your choice using Linode and its products during the whole of June. This time around, we've got 5 Grand Prizes and 10 Runner Up Prizes on the line.

Conclusion

And now you know, ya know?

Thanks to Linode and Hashnode for encouraging me to build something for the Linode x Hashnode Hackathon. The code is on GitHub.

Looking towards the future, I'm hoping to utilize Part of Speech Tagging and Link Grammar to shorten sentences.

 
Share this