{ sung.codes }

by dance2die
Blog
← Go Back

How to create a Hacker News API GraphQL data source for GatsbyJS

Broken Post? β†’ Let me know

In the previous post, I introduced a tech stack for SHaNc.

I will go into more details on how to create a Hacker News GraphQL data source for GatsbyJS.

❓ Why?

Because GatsbyJS can query data only via GraphQL endpoints. Refer to Querying with GraphQL.

πŸ€” Assumption

I will assume that you are familiar with JavaScript promises, and async/await.

Β πŸ’­ Terminologies & Concepts

Let's make sure we are on the same page.

  1. GraphQL Source - This is the data that GatsbyJS can query via GraphQL.
  2. Node - A node is called a "model" (according to documentation), which is a shape of how the data looks (Not Node.JS).
  3. gatsby-node.js - This is where you define your GraphQL sources and it's located in the project root.

Now we've cleared some terms and concepts, let's' review Hacker News API.

Β πŸ” Hacker News API Overview

The Official Hacker News API ("HN API" hereafter) exposes top level endpoints for "Top", "Best", and "New" stories.

Top level endpoints returns only IDs with no other data associated with it.

Calling "https://hacker-news.firebaseio.com/v0/topstories.json" returns an array of story IDs

[ 9127232, 9128437, 9130049, 9130144, 9130064, 9130028, 9129409, 9127243, 9128571, ..., 9120990 ]

So you'd need to make a call for each story ID returned from the top level endpoint. It's not an optimal design and HN team admits it. But I am thankful that HN team has provided a public API for their stories.

So with that in mind, let's move on to creating a source.

Β πŸ™ Implementation Steps

Now let's see how one can turn Hacker News API into a GraphQL Source by wrapping it as a Node by following steps below.

  1. Get all top level story IDs from HN API
  2. Create source nodes
  3. Make it available to GatsbyJS
πŸ’‘ Get all top level story IDs from HN API

Let's get all top level story IDs from HN API.

const topStoriesURL = `https://hacker-news.firebaseio.com/v0/topstories.json`
const newStoriesURL = `https://hacker-news.firebaseio.com/v0/newstories.json`
const bestStoriesURL = `https://hacker-news.firebaseio.com/v0/beststories.json`
const getItemURL = storyId =>
`https://hacker-news.firebaseio.com/v0/item/${storyId}.json`
const topResults = await axios.get(topStoriesURL)
const newResults = await axios.get(newStoriesURL)
const bestResults = await axios.get(bestStoriesURL)
view raw toplevel.js hosted with ❀ by GitHub

There are duplicate stories in Top, New, and Best stories. So let's cache only distinct story IDs.

// Combine all story IDs to get all items in one go for "items" map
// We need only distinct SET of IDs.
const storyIds = [
...new Set([...topResults.data, ...newResults.data, ...bestResults.data]),
]
view raw storyIds.js hosted with ❀ by GitHub

Getting all stories is as simple as calling an endpoint with story ID as part of the URL.

const getItemURL = storyId =>
`https://hacker-news.firebaseio.com/v0/item/${storyId}.json`
const getStories = async storyIds => {
const stories = storyIds.map(storyId => axios.get(getItemURL(storyId)))
return Promise.all(stories)
}
view raw getStories.js hosted with ❀ by GitHub

You are creating sources for "Top", "New", and "Best" stories where "data" contains arrays of story IDs that were fetched in previously.

We've now fetched all data, now let's create story nodesΒ  to expose it for GatsbyJS.

πŸ’‘ Create source nodes

We've retrieved top/new/BestResults from the previous step, and we now use them to create nodes as shown above.

createStoryNodes(topResults.data, 'TopStories')
createStoryNodes(newResults.data, 'NewStories')
createStoryNodes(bestResults.data, 'BestStories')
view raw createStoryNodes.js hosted with ❀ by GitHub

Let's take a look at the implementation of aptly named, createStoryNodes method.

const createStoryNodes = (data, type) =>
data.map(storyId => {
const id = `${type}-${storyId}`
const storyNode = {
id,
parent: null,
children: [],
internal: { type },
storyId: storyId,
item: items.get(storyId),
}
storyNode.internal.contentDigest = buildContentDigest(storyNode)
createNode(storyNode)
})
view raw createStoryNodes2.js hosted with ❀ by GitHub

The shape is defined by storyNode between line 4~11. Let's go over each property.

  1. id
    • This is created by combining the type with story ID, where the types are "TopStories", "BestStories", and "NewStories".
    • This makes each record distinct so that you can get this record and only this record if you need to.
    • This is similar to a primary key if you are familiar with database terms.
    • You can't just use a story ID as an ID, as Top, Best, and New stories can contain duplicate stories, that was the reason for the "type" to make each record distinct globally.
  2. parent & children
    1. I honestly do not knowΒ πŸ˜… exact use cases for this yet as I could not find any good documentations for them yet.
    2. The best I found was this documentation but without a concrete example, I had to look at other source plugins like gatsby-source-firebase.
    3. A shameless begging - I'd appreciate it if you can help me understand why, where, and hows of these parameters
  3. internal -
    1. This is how you want the name of GraphQL type
    2. graphql - topstories.jpg
    3. For three createStoryNodes method calls, I passed "TopStories" for the first call so it's available as "topStories" in GraphQL.
  4. storyId - This is self-explanatory, skip!
  5. item - This contains actual story data but what's that items.get(storyId)?

Remember that we defined getStories function but never called? items is a map of all stories fetched using getStories as shown below.

// Build item details map
// for an O(1) look up for fetched item details
const items = (await getStories(storyIds))
.map(res => res.data)
.filter(item => item !== null)
.reduce((acc, item) => acc.set(item.id, item), new Map())
view raw items.js hosted with ❀ by GitHub

The code above fetches stories and caches them into a map, from which we can construct the stories with. A new Map object (not Array#map) is used for a constant time (O(1)) look up for an efficient data retrieval.

Content Digest (scroll down to "Parameters") helps GatsbyJS track whether data has been changed or not enabling it to be more efficient. The implementation of buildContentDigest is shown below.

const buildContentDigest = content =>
crypto
.createHash(`md5`)
.update(JSON.stringify(content))
.digest(`hex`)
view raw buildContentDigest.js hosted with ❀ by GitHub

It uses to serialize story into a hex representation using MD5 hashing algorithm. Honestly again, I used the implementation in the documentation as I don't know much about GatsbyJS's internal details.

πŸ’‘ Make it available to GatsbyJS

Now you export the stories source for GatsbyJS at the bottom of gatsby-node.js file.

exports.sourceNodes = async ({ boundActionCreators }) => {
await createStoriesSource(boundActionCreators)
}
view raw exports.sourceNodes.js hosted with ❀ by GitHub

Β πŸ“ž How to call (use the source)

GatsbyJS automatically converts graphql`...` function behind the scene, so all you have to do is to query the data source you created (full source).

// imports and stuff ...
...
// "data" is passed by Gatsby with result of "StoriesQuery"
const IndexPage = ({ data }) => (
<Stories stories={data.allTopStories.edges} title="Top Stories" />
)
export default IndexPage
export const query = graphql`
query StoriesQuery {
allTopStories {
edges {
node {
id
storyId
item {
id
title
score
by
time
type
url
}
}
}
}
}
`
view raw index.js hosted with ❀ by GitHub

GatsbyJS passes a prop containing data property, which in turn contains actual data fetched using GraphQL.

Here is the full source code of gatsby-node.js.

/**
* Implement Gatsby's Node APIs in this file.
*
* See: https://www.gatsbyjs.org/docs/node-apis/
*/
const axios = require('axios')
const crypto = require('crypto')
const buildContentDigest = content =>
crypto
.createHash(`md5`)
.update(JSON.stringify(content))
.digest(`hex`)
const createStoriesSource = async ({ createNode }) => {
const topStoriesURL = `https://hacker-news.firebaseio.com/v0/topstories.json`
const newStoriesURL = `https://hacker-news.firebaseio.com/v0/newstories.json`
const bestStoriesURL = `https://hacker-news.firebaseio.com/v0/beststories.json`
const getItemURL = storyId =>
`https://hacker-news.firebaseio.com/v0/item/${storyId}.json`
const topResults = await axios.get(topStoriesURL)
const newResults = await axios.get(newStoriesURL)
const bestResults = await axios.get(bestStoriesURL)
// Combine all story IDs to get all items in one go for "items" map
// We need only distinct SET of IDs.
const storyIds = [
...new Set([...topResults.data, ...newResults.data, ...bestResults.data]),
]
const getStories = async storyIds => {
const stories = storyIds.map(storyId => axios.get(getItemURL(storyId)))
return Promise.all(stories)
}
// Build item details map
// for an O(1) look up for fetched item details
const items = (await getStories(storyIds))
.map(res => res.data)
.filter(item => item !== null)
.reduce((acc, item) => acc.set(item.id, item), new Map())
// Expose a hacker new story available for GraphQL query
const createStoryNodes = (data, type) =>
data.map(storyId => {
const id = `${type}-${storyId}`
const storyNode = {
id,
parent: null,
internal: { type },
children: [],
storyId: storyId,
item: items.get(storyId),
}
storyNode.internal.contentDigest = buildContentDigest(storyNode)
createNode(storyNode)
})
createStoryNodes(topResults.data, 'TopStories')
createStoryNodes(newResults.data, 'NewStories')
createStoryNodes(bestResults.data, 'BestStories')
}
const createBuildMetadataSource = ({ createNode }) => {
const buildMetadataNode = {
// There is only one record
id: `I am the build metadata source id`,
parent: null,
internal: { type: `BuildMetadata` },
children: [],
// Unix time format to be consistent with HackerNews API date format
buildDate: new Date().getTime() / 1000,
}
buildMetadataNode.internal.contentDigest = buildContentDigest(
buildMetadataNode
)
createNode(buildMetadataNode)
}
exports.sourceNodes = async ({ boundActionCreators }) => {
await createBuildMetadataSource(boundActionCreators)
await createStoriesSource(boundActionCreators)
}
view raw gatsby-node.js hosted with ❀ by GitHub

πŸ‘‹ Parting Words

The code might not be optimal at fetching data, but static site generator will cache it before generating sites so wouldn't affect the site performance in the end.

But I'd love to see if you have any suggestions on how to improve it :)

You can create an issue on GitHub or send me a tweet. Full source for gatsby-node.js can be found here.