How to create a Hacker News API GraphQL data source for GatsbyJS

featured
June 16, 2018
πŸ’« Originally posted here. Broken? Let me know ~

In the previous post, I introduced a tech stack for SHaNc.

I will go into more details on how to create a Hacker News GraphQL data source for GatsbyJS.

❓ Why?

Because GatsbyJS can query data only via GraphQL endpoints.
Refer to Querying with GraphQL.

πŸ€” Assumption

I will assume that you are familiar with JavaScript promises, and async/await.

Β πŸ’­ Terminologies & Concepts

Let’s make sure we are on the same page.

  1. GraphQL Source – This is the data that GatsbyJS can query via GraphQL.
  2. Node – A node is called a “model” (according to documentation), which is a shape of how the data looks (Not Node.JS).
  3. gatsby-node.js – This is where you define your GraphQL sources and it’s located in the project root.

Now we’ve cleared some terms and concepts, let’s’ review Hacker News API.

Β πŸ” Hacker News API Overview

The Official Hacker News API (“HN API” hereafter) exposes top level endpoints for “Top”, “Best”, and “New” stories.

Top level endpoints returns only IDs with no other data associated with it.

So you’d need to make a call for each story ID returned from the top level endpoint.
It’s not an optimal design and HN team admits it.
But I am thankful that HN team has provided a public API for their stories.

So with that in mind, let’s move on to creating a source.

Β πŸ™ Implementation Steps

Now let’s see how one can turn Hacker News API into a GraphQL Source by wrapping it as a Node by following steps below.

  1. Get all top level story IDs from HN API
  2. Create source nodes
  3. Make it available to GatsbyJS
πŸ’‘ Get all top level story IDs from HN API

Let’s get all top level story IDs from HN API.

There are duplicate stories in Top, New, and Best stories. So let’s cache only distinct story IDs.

Getting all stories is as simple as calling an endpoint with story ID as part of the URL.

You are creating sources for “Top”, “New”, and “Best” stories where “data” contains arrays of story IDs that were fetched in previously.

We’ve now fetched all data, now let’s create story nodesΒ  to expose it for GatsbyJS.

πŸ’‘ Create source nodes

We’ve retrieved top/new/BestResults from the previous step, and we now use them to create nodes as shown above.

Let’s take a look at the implementation of aptly named, createStoryNodes method.

The shape is defined by storyNode between line 4~11. Let’s go over each property.

  1. id
    • This is created by combining the type with story ID, where the types are “TopStories”, “BestStories”, and “NewStories”.
    • This makes each record distinct so that you can get this record and only this record if you need to.
    • This is similar to a primary key if you are familiar with database terms.
    • You can’t just use a story ID as an ID, as Top, Best, and New stories can contain duplicate stories, that was the reason for the “type” to make each record distinct globally.
  2. parent & children
    1. I honestly do not knowΒ πŸ˜… exact use cases for this yet as I could not find any good documentations for them yet.
    2. The best I found was this documentation but without a concrete example, I had to look at other source plugins like gatsby-source-firebase.
    3. A shameless begging – I’d appreciate it if you can help me understand why, where, and hows of these parameters
  3. internal
    1. This is how you want the name of GraphQL type
    2. graphql - topstories.jpg
    3. For three createStoryNodes method calls, I passed “TopStories” for the first call so it’s available as “topStories” in GraphQL.
  4. storyId – This is self-explanatory, skip!
  5. item – This contains actual story data but what’s that items.get(storyId)?

Remember that we defined getStories function but never called?
items is a map of all stories fetched using getStories as shown below.

The code above fetches stories and caches them into a map, from which we can construct the stories with.
A new Map object (not Array#map) is used for a constant time (O(1)) look up for an efficient data retrieval.

Content Digest (scroll down to “Parameters”) helps GatsbyJS track whether data has been changed or not enabling it to be more efficient.
The implementation of buildContentDigest is shown below.

It uses to serialize story into a hex representation using MD5 hashing algorithm.
Honestly again, I used the implementation in the documentation as I don’t know much about GatsbyJS’s internal details.

πŸ’‘ Make it available to GatsbyJS

Now you export the stories source for GatsbyJS at the bottom of gatsby-node.js file.

Β πŸ“ž How to call (use the source)

GatsbyJS automatically converts graphql... function behind the scene, so all you have to do is to query the data source you created (full source).

GatsbyJS passes a prop containing data property, which in turn contains actual data fetched using GraphQL.

Here is the full source code of gatsby-node.js.

πŸ‘‹ Parting Words

The code might not be optimal at fetching data, but static site generator will cache it before generating sites so wouldn’t affect the site performance in the end.

But I’d love to see if you have any suggestions on how to improve it πŸ™‚

You can create an issue on GitHub or send me a tweet.
Full source for gatsby-node.js can be found here.