5 Powerful Insights into RisingWave – An Open Source Distributed SQL for Stream Processing

Introduction

RisingWave is an SQL-interfacing, cloud-native streaming database that offers numerous benefits for the development of real-time applications.

By seamlessly handling streaming data, executing continuous queries, and updating results in real time, RisingWave effectively reduces both the complexity and costs associated with building such applications.

RisingWave then maintains the results in its storage where users can access them using SQL.

RisingWave accepts data from sources like Apache Kafka, Apache Pulsar, Amazon Kinesis, and Redpanda, and materialized CDC sources and outputs to external targets such as message brokers, data warehouses, and data lakes for storage or additional processing.

RisingWave Internal Architecture- RisingWave-Review

And Yes, RisingWave is open source; check out the source code on GitHub

RisingWave can ingest data, store data, and respond to concurrent access requests from end users. All of these requests can be expressed using PostgreSQL-style SQL.


1- How Does RisingWave Work?

RisingWave works by ingesting data from streaming sources, processing it using continuous queries, and storing the results in its storage.

It includes the below components:

  • A serving layer that provides an endpoint to accept SQL queries through Postgres protocol. It parses and validates queries, provides an optimized query execution plan to the processing layer, and finally returns the results to the user.
  • The processing layer executes the optimized query plans. It handles data ingestion from the serving layer and produces results for both the serving layer and the storage layer.
  • The metadata of different nodes is stored in local metadata storage. The metadata management layer is responsible for coordinating operations between different nodes and contains various management modules to manage the metadata. It is also responsible for performing health checks on the nodes and handling node failures.

A storage layer that handles interactions with object storage services like Amazon S3. It prepares data for ingestion by the processing layer and storage in the object storage service.

RisingWave Architecture and How Does RisingWave Work and RisingWave-Review

RisingWave combines the benefits of a distributed, scalable database system with stream processing capabilities.

RisingWave provides efficient data storage, real-time stream processing, materialized views for faster access, transactional consistency, and compatibility with existing tools and libraries, enabling developers to build robust and scalable applications that handle both traditional and streaming data effectively.

A very simple example of connecting to RisingWave using JavaScript (Node.js) and executing SQL queries:

const { Client } = require('pg');
const client = new Client({
  user: 'your_username',
  password: 'your_password',
  host: 'your_host',
  port: 5432,
  database: 'your_database',
});
client.connect();
client.query('CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR(50), email VARCHAR(100))')
  .then(() => {
    return client.query("INSERT INTO users VALUES (1, 'John Doe', 'john@example.com')");
  })
  .then(() => {
    return client.query('SELECT * FROM users');
  })
  .then((result) => {
    result.rows.forEach((row) => {
      console.log(row);
    });
  })
  .catch((err) => {
    console.error('Error executing query:', err);
  })
  .finally(() => {
    client.end();
  });

2- Why it’s an Excellent Choice for Developers

RisingWave transcends the capabilities of ordinary SQL database systems by introducing the remarkable power of stream processing.

risingwave-review

With RisingWave, effortlessly managing streaming data, performing continuous queries, and dynamically updating results through materialized views becomes a reality.

RisingWave like having a magic crystal ball that constantly updates with the latest information

Incorporating cutting-edge cloud infrastructure, RisingWave maximizes the advantages offered by this technology. One notable benefit is the ability to independently and infinitely scale computing power and storage capacity, tailoring them to the specific requirements of users.

This ensures seamless performance and efficient resource utilization, as RisingWave effortlessly adapts and expands to handle any data volume or workload. Whether serving a few users or facing a substantial surge in demand, RisingWave stands prepared to scale up and overcome challenges with confidence.

Streaming Data and nodes

3- The Benefits of Rising Wave

Incremental Updates to Materialised Views

A standout feature of RisingWave lies in its ability to perform incremental updates to materialized views, setting it apart from conventional database systems. This unique functionality ensures that materialized views remain constantly up to date with the latest data in real-time.

  • RisingWave employs an intelligent approach, identifying and selectively updating only the affected portions of the view. This removes the need to recompute the entire view whenever a change occurs.
  • By adopting this selective update strategy, RisingWave optimizes query performance and significantly reduces computational overhead.

This translates to faster and more efficient data processing, allowing users to obtain insights with impressive responsiveness. To illustrate the practical implications of this capability, let’s consider a retail business scenario.

With RisingWave, as new sales transactions occur, the materialized view for sales is seamlessly updated with the latest figures. This means that managers and analysts have instant access to accurate insights without the system having to perform redundant computations.

RisingWave maintains the accuracy and consistency of materialized views, enabling organizations to stay on top of their data analysis requirements and make data-driven decisions promptly and effectively.

Transactional Consistency

RisingWave stands out for its unwavering commitment to transactional consistency.

This means you can trust the system to uphold data integrity and process transactions reliably.

Whether you’re updating records, running complex queries, or dealing with multiple operations happening at the same time, RisingWave ensures that all transactions are handled consistently.

Even in busy environments with many users or systems accessing and changing data simultaneously, RisingWave sets clear boundaries to prevent conflicts or data discrepancies. Thanks to its built-in mechanisms, RisingWave maintains the essential ACID properties (Atomicity, Consistency, Isolation, and Durability).

For example:

Imagine an online ride-hailing platform called “SwiftRide.”

SwiftRide relies on RisingWave to manage and process its ride-booking transactions. When a customer books a ride, multiple actions must occur simultaneously, such as updating the driver’s availability, deducting the fare from the customer’s account, and recording the transaction details.

RisingWave’s transactional consistency ensures that all these actions are executed as a single atomic transaction. If any part of the transaction fails or encounters an error, RisingWave guarantees that all changes made so far are rolled back, maintaining the system’s integrity.

For instance, if there is a connectivity issue while deducting the fare, RisingWave’s transactional consistency will ensure that the driver’s availability remains the same and the customer’s account is correctly charged.

The entire booking process is handled consistently, preventing any inconsistencies or data discrepancies.

SQL based with Postgres Protocol

RisingWave offers a SQL-based approach with the support of the Postgres protocol.

Developers can use familiar SQL syntax to interact with the database.

  • RisingWave ensures compatibility with existing tools and libraries compatible with Postgres, making integration effortless.
  • The SQL-based nature allows for expressive queries and easy migration from PostgreSQL.

With RisingWave, developers can leverage their SQL skills and work efficiently with the database.

Here’s an example to illustrate the SQL-based nature of RisingWave with the Postgres protocol:

Let’s say you have a table called “users” in your RisingWave database that stores information about registered users.

A straightforward SQL query can be used to retrieve the names of all active users.

SELECT name FROM users WHERE active = true;

The SQL-based nature of RisingWave allows developers to write expressive and efficient queries to retrieve and manipulate data.

You can leverage familiar SQL syntax, such as SELECT, UPDATE, INSERT, and DELETE, to interact with the database and perform a wide array of operations.

Moreover, the SQL-based approach facilitates easy migration from traditional relational database systems, such as PostgreSQL, to RisingWave.

Push and Pull Capabilities for Queries

RisingWave offers push and pull capabilities for queries.

In push mode, real-time query results are actively pushed to subscribed clients or applications.

// Subscribe to real-time updates for a specific query
risingwave.subscribe('SELECT * FROM sensor_data', (result) => {
  // Process the updated query results in real-time
  console.log('Received real-time update:', result);});

In this example, we subscribe to a query that selects all sensor data from the database. If there is a change in the sensor data that matches the query, RisingWave actively pushes the updated results to the subscribed client or application. The callback function (result) => {...} is called with the updated query results, allowing you to process the data in real-time.

  • Pull mode allows clients to retrieve query results on-demand when needed.// Fetch query results on-demand risingwave.query('SELECT * FROM user_data') .then((result) => { // Process the retrieved query results console.log('Fetched query results:', result); }) .catch((error) => { // Handle any errors that occur during query execution console.error('Error executing query:', error); });In this example, we execute a query to select all user data from the database. RisingWave retrieves the query results on-demand when the code is executed. The resulting data is then processed within the .then() block. This approach allows you to fetch the query results as needed, providing more control over when and how frequently the query is executed.

This flexibility caters to different use cases, providing real-time updates or fetching data as required.

Joining Multiple Streams

RisingWave excels in joining multiple streams, allowing seamless combination and analysis of data from diverse sources.

By performing real-time joins on streaming data, RisingWave enables insightful correlations, advanced analytics, and comprehensive data understanding.

const joinQuery = "SELECT s.product_id, s.sale_quantity, i.stock_quantity " + "FROM sales_stream s " + "JOIN inventory_stream i ON s.product_id = i.product_id";
try {
   const result = await client.query(joinQuery);
   for (const row of result) {
       console.log(row);
   }
} finally {
   await client.end();
}

In this example, we have two streams: sales_stream and inventory_stream. These streams represent data sources containing sales and inventory information, respectively. We want to join these streams based on the common key product_id.

The join query selects the product_id, sale_quantity, and stock_quantity columns from the two streams and performs the join operation. By executing this query using RisingWave, we can retrieve the joined data, which includes information about the product sales and inventory quantities.

Finally, we process the joined data by iterating over the result and printing each row.


4- Top 3 Use Cases of RisingWave

It presents a wide range of use cases where real-time data processing, stream analytics, and materialized views are of utmost importance. Let’s explore some of these applications in more detail:

Real-time Analytics and Dashboards

It is particularly well-suited for constructing real-time analytics systems and interactive dashboards. By efficient processing and analyzing streaming data in real time, it enables users to gain immediate insights and visualize data as it continuously flows.

Leveraging RisingWave’s materialized views, you can easily aggregate, filter, and transform streaming data, empowering the creation of dynamic and up-to-date dashboards.

IoT Data Processing

The proliferation of the Internet of Things (IoT) has led to an exponential growth in data generated by sensors, devices, and applications. RisingWave’s stream processing capabilities make it an excellent choice for handling and analyzing IoT data streams.

Whether you need to monitor sensor data, detect anomalies, or perform real-time analytics on IoT data, RisingWave can effortlessly handle the continuous influx of data, providing valuable insights for informed decision-making.

Fraud Detection and Monitoring

It proves to be a valuable asset in the realm of real-time fraud detection and monitoring. By ingesting and analyzing streaming data from diverse sources such as transactions, user activities, or system logs, RisingWave excels at identifying suspicious patterns, detecting anomalies, and triggering timely alerts.

The ability to join streams and execute continuous queries enables comprehensive fraud monitoring, empowering proactive actions to mitigate risks promptly.

These examples merely scratch the surface of the numerous use cases for RisingWave. The versatility of its stream processing materialized views, and real-time capabilities open up a world of possibilities across various industries.

Whether it’s finance, e-commerce, telecommunications, or countless others, RisingWave provides a powerful platform for leveraging streaming data and driving transformative applications.


5- How is RisingWave Different from Flink?

Apache Flink is a popular open-source stream processing framework.

RisingWave and Flink share some similarities like low-latency query processing over continuously ingested streaming data, but they also have some key differences.

Let’s take a look at how RisingWave is different from Flink and how it offers a better alternative for stream processing.

Comparison Table between Apache Flink and RisingWave

Database Integration

Flink is a stream processing framework that requires a separate database for storing and managing data. This requires an overhead with infrastructure and extra cost of operation in syncing Flink and the external storage.

RisingWave can provide the same capability, but additionally, it can be used as a database itself.

It stores its results in a PostgreSQL database, which provides ease of integration with external systems due to the already existing PostgreSQL ecosystem.

It also integrates with cloud storage services like Confluent Cloud, DataStax, and Grafana Cloud, which supports a wide range of  application scenarios and provides the flexibility to choose the right storage solution for the task at hand.

User-Friendly Approach

RisingWave aims to provide an easy and familiar experience for developers.

  • It offers a SQL-based interface, allowing developers with SQL knowledge to easily leverage stream processing capabilities.
  • It introduces very few stream processing concepts, making it easy to use without an in-depth understanding of stream processing.

Flink, on the other hand, provides a lower-level programming model, which requires more specialized expertise. It requires developers to manage more responsibilities and requires more fine-grained manual configurations. This can be challenging for developers who are not familiar with stream processing.

Scalability and Fault Tolerance

Both RisingWave and Flink excel in providing scalability and fault tolerance, albeit through different mechanisms.

Flink achieves scalability and fault tolerance through its distributed processing engine. By leveraging the power of multiple machines, Flink enables computations to be distributed across a cluster, allowing for efficient parallel processing. In the event of a machine failure, Flink’s fault tolerance mechanisms kick in, allowing the system to recover and continue processing without losing data or compromising the overall computation.

On the other hand, RisingWave takes advantage of cloud infrastructure to achieve scalability and fault tolerance.

It provides the flexibility to independently scale compute and storage resources, allowing organizations to adapt and allocate resources based on their specific needs.

This approach ensures efficient resource utilization, as compute and storage capacities can be scaled independently and seamlessly. Whether there is a surge in computational demands or a need for increased data storage, RisingWave can dynamically adjust its resources to maintain optimal performance.

Use Cases and Ecosystem

Flink is frequently employed in real-time analytics, stream processing, and event-driven applications. Its extensive ecosystem offers diverse integrations, connectors, and libraries for seamless interaction with various data sources.

In contrast, RisingWave is specifically tailored to scenarios that necessitate real-time insights and continuous query processing alongside traditional database operations.

Check out the detailed comparison documentation here.


Conclusion

RisingWave empowers developers and businesses to process streaming data in real-time, derive meaningful insights, and build scalable applications across a range of use cases.

  • It is well-suited for applications such as real-time analytics, IoT data processing, and fraud detection, offering flexibility across industries.
  • It ensures transactional consistency, maintaining data integrity and enforcing strict boundaries for concurrent operations, even in highly distributed environments.

As the demand for real-time data processing continues to grow, RisingWave stands out as a valuable tool for developers and businesses looking to unlock the full potential of their streaming data.

Stay in Touch

That was it for this blog.

I hope you learned something new today.

If you did, please like/share so that it reaches others as well.

Connect with me on Twitter

Want to read more interesting blog posts

✅ Here are some of my most popular posts that you might be interested in.