In modern application development, efficiently querying and retrieving real-time data is crucial to building robust and performant systems. Using materialized views we can improve query performance. When it is combined with GraphQL and a steaming database, developers can define their queries to leverage these materialized views for the data that constantly changes, allowing them to build highly responsive and interactive applications.
For example, social media platforms like Twitter produce a massive volume of data every second. This data is valuable for analyzing trends and user behavior. In this article, we will explore how integrating GraphQL, materialized views, and streaming databases such RisingWave can enable us to efficiently query tweets and discover the hottest hashtags in real-time.
Before diving into implementation, it is important to understand these 3 concepts (GraphQL, materialized view, and streaming database). I believe you are smart enough to use Google or ChatGPT to find out this information. However, I am going to explain shortly why this integration can be beneficial and the role of each in the next section.
Materialized views as precomputed caches
With a materialized view, we can precompute and store the results of frequently executed SQL queries. It is a denormalized representation of the data, which means that complex joins and aggregations are already performed and stored in a database like PostgreSQL. This simplifies the query logic required to retrieve data and eliminates the need for us to manually handle join operations. In Twitter’s case, materialized views can be used to compute summaries of user activity, such as the number of followers, a hashtag used, likes, or comments.
GraphQL to retrieve exactly what data we need
GraphQL allows us to specify exactly what data we need and receive it in a single request without using an additional programming-language-specific data-processing framework or defining a bunch of entity objects and endpoints in the case of REST(Representational State Transfer). Unlike REST, GraphQL significantly reduces the number of round trips, resulting in faster data fetching. With GraphQL, we can directly access the materialized views in the database through the defined schema, abstracting away the complexities of the underlying database structure. A social media analytics platform can leverage GraphQL to offer a flexible API for querying and analyzing user-generated content.
There are several popular GraphQL builders and frameworks available that can assist in building GraphQL APIs. You can connect to different data sources and integrate with popular databases like PostgreSQL, and MySQL. Here are some of the widely used ones:
StepZen is a platform to build and deploy GraphQL APIs that integrates and aggregate data from various sources. In the demo section, I will show how to build a GraphQL API in declarative code using StepZen.
Streaming database to process real-time data
To leverage the full potential of real-time data querying with GraphQL, a streaming database can be utilized. Materialized views may not always contain the most up-to-date data since they depend on when and how often the view is refreshed. Traditional databases such as PostgreSQL support materialized views, but to see the query over time, you need to rerun the same query – and again. To get updated results, you need to execute queries repeatedly. While a streaming database like RisingWave does almost all of its work at write time. This post explains how a streaming database differs from a traditional database. When data flows into the streaming database, it’s processed and immediately used to update the existing materialized views. It can ingest data from various data sources like Kafka or Pulsar. By combining GraphQL with a streaming database, we can continuously ingest incoming Tweet posts and update the materialized views in real-time and query changes in data instantly.
RisingWave is wire-compatible with PostgreSQL, you can use the existing StepZen support for a Postgres database as a data source for your GraphQL API. You can query Source, Table, and Materialized View from RisingWave.
Query Twitter events demo
Once we understand the importance of GraphQL, materialized views, and a streaming database, we can use this combination to create a new way to access tweet data. By pre-calculating summaries of the data and exposing them through a GraphQL endpoint, we can quickly get valuable insights from the tweets.
Before you begin
In the tutorial, we will leverage the existing use-case demo of RisingWave on the website called fast Twitter events processing. Make sure you completed the tutorial by cloning, launching the demo project there using Docker, connecting RisingWave to Kafka data streams, and defining a materialized view as the tutorial guides you.
Other prerequisites to install are:
When you are ready
Now I assume that you configured RisingWave and we have a materialized view named hot_hashtags
processed by the RisingWave that tracks how often each hashtag is used daily on Twitter. In the next steps, we install and set up StepZen, design the GraphQL schema, map GraphQL queries to the actual data in the materialized view, and finally, expose the GraphQL Endpoint.
Step 1: Install StepZen CLI
Note that you can also follow the instructions on StepZen website to install and run it. The StepZen command-line interface (CLI) provides commands to set up and manage StepZen. Run the following command to install the StepZen CLI:
npm install -g stepzen
Step 2: Run StepZen in Docker
Next we run StepZen service on your local machine using the StepZen CLI we installed in the previous step:
stepzen service start
To use the StepZen CLI for local development, you must log in by pointing the CLI to the local configuration. Simply run the following command after StepZen service started:
stepzen login --config ~/.stepzen/stepzen-config.local.yaml
Step 3: Clone the graphql-stepzen-risingwave demo project
Find my repository called graphql-stepzen-risingwave-demo on GitHub. Git clone this repository onto your machine. This project already has everything you need. There is written schema code in a postgresql.graphql GraphQL Schema Definition Language (SDL) file with types and queries defined for the materialized view hot_hashtags
in the RisingWave database. It has also a stepzen.config.json file with our GraphQL endpoint.
git clone https://github.com/Boburmirzo/graphql-stepzen-risingwave-demo.git
cd graphql-stepzen-risingwave-demo
Step 4: Create a ngrok account and install ngrok (Optional)
Note that it is just an additional step if you are running StepZen and RisingWave in your local environment. In the case of cloud or running instances in your server that are accessible via the internet, you can set a direct address of RisingWave in the StepZen database configuration. If you are running both StepZen and RisingWave within Docker containers, you can specify the address of
host.docker.internal
.
Ngrok is a tool that creates a secure tunnel between a public internet address and a local server running on your machine. Create an account for ngrok and make sure that you can access it by setting authtoken
. To make the Risingwave database available to other services outside your private network, you need to create a TCP tunnel. For this ngrok will be used on the port where RisingWave is running 4566
:
ngrok tcp 4566
After you run the command, Ngrok will return the forwarding address for the local RisingWave database, which will look something like this: 0.tcp.eu.ngrok.io:14304
You need to add this to the file ./config.yaml
in the demo project where you need to replace {{ngrok_tunnel}}
with your address 0.tcp.ngrok.io:15650
:
configurationset:
- configuration:
name: postgresql_config
uri: postgresql://root:@0.tcp.eu.ngrok.io:15650/dev
Step 5: Run GraphQL Endpoint
By running the command stepzen start
, you can deploy the GraphQL schema we have in postgresql.graphql file to StepZen. This instantly creates a GraphQL API in the localhost, accessible through the configured endpoint (api/twitter
) in stepzen.config.json file. If you navigate to http://localhost:5001/api/twitter, you will the StepZen dashboard explorer on your browser.
Step 6: Query the materialized view
StepZen explorer shows available queries with their attributes. When you run the following query getHotHashtags
, it will pull data from the RisingWave materialized view and shows returned data in the explorer.
query MyQuery {
getHotHashtags {
hashtag
hashtag_occurrences
window_start
}
}
See the output:
If you open the postgresql.graphql file, you will see how I used the GraphQL directive @dbquery to connect the database and write an SQL query to select the TOP 10 popular hashtags.
type Query {
getHotHashtags: [hot_hashtags]
@dbquery(
type: "postgresql"
query: """
SELECT
*
FROM
hot_hashtags
ORDER BY
hashtag_occurrences DESC
LIMIT
10
"""
configuration: "postgresql_config"
)
}
Next steps
So far, we have built and deployed a GraphQL API api/twitter
with the database backend. Next, you can observe how to consume real-time data from other sources than Kafka with RisingWave, combine multiple streams of data, create materialized views on joined streams, and create a sequence of queries with StepZen.
Conclusion
With StepZen GraphQL API we simplified data access without introducing a backend service to do this work and we combined it with the real-time updates provided by RisingWave. In summary, querying real-time data with GraphQL and the streaming database opens up new possibilities for creating highly responsive and interactive applications.
Related resources
Recommended content
Community
🙋 Join the Risingwave Community