Next.js + LLMs: The Guide to Streaming Structured Responses

Written by

Published on

In the rapidly evolving landscape of AI-powered applications, delivering a seamless user experience isn’t just about what you build—it’s about how you deliver it.

Today, I’ll share how we implemented LLM response streaming in Next.js, a challenge that reshaped the way our users engage with AI-driven responses.

Picture this: A user sends a message to your AI chatbot.

Instead of staring at a loading spinner for a long time, they can see the response display word by word, making the conversation feel more engaging and dynamic.

The Challenge

While our backend services using LLMs generate accurate responses, users often grew impatient waiting for complete answers.

The real technical challenge wasn’t just about streaming text, it was about maintaining structured JSON data to support charts and tables generation, all while ensuring the interface stayed responsive and engaging.

Partial JSON Parsing

We found partial-json, a library that solved our JSON streaming puzzle.

This allowed us to handle incomplete JSON structures gracefully, turning a potential technical nightmare into a smooth user experience.

The Solution

Let me share the core implementation of our streaming functionality:

1. Seamlessly Decode Incoming Streams

The decoding process relies on several key mechanisms:

  • TextDecoder converts raw binary data into readable text.
  • Each chunk is processed as it arrives, ensuring smooth data flow.
  • The buffer accumulates decoded text to handle partial messages.

If we don’t use UTF-8 encoding, streaming chunk by chunk can lead to issues.

Without proper encoding, the response might be displayed all at once instead of gradually, disrupting the intended streaming experience.

2. Managing Partial JSON Data

The partial JSON handling is crucial because:

  • Traditional JSON.parse() would fail on incomplete structures.
  • The parse() function (from partial-json) can handle incomplete JSON.

3. Memory Efficiency

Memory efficiency is achieved through:

  • Processing one chunk at a time.
  • Not storing the entire response in memory.
  • Clearing the buffer after successful parsing.
  • Using streaming instead of buffering the whole response.

4. Real-time Feedback

The real-time feedback system:

  • Immediately processes each chunk as it arrives.
  • Notifies the UI layer through the onChunk callback.
  • Includes a completion flag (isDone).
  • Allows for progressive UI updates.

Putting It All Together

Here’s how this works together in a typical flow:

  1. Stream starts.
  2. Reader is initialized.
  3. Decoder is prepared.
  4. Buffer is emptied.
  5. For each chunk:
  6. Data is read from the stream.
  7. Chunk is decoded to text.
  8. Added to buffer.
  9. Attempted parsing.
  10. UI is updated if parsing succeeds.
  11. Stream ends.
  12. Final chunk is processed.
  13. Buffer is cleared.
  14. Completion is signaled.
  15. Resources are cleaned up.

Ready to Implement?

Here are three key tips from our experience:

  1. Always implement timeout mechanisms—streams shouldn’t hang indefinitely.
  2. Build robust error handling—network issues are inevitable.
  3. Monitor your buffer management—memory usage can creep up with large responses.

Remember, the goal isn’t just to stream data—it’s to create an experience that makes AI feel more natural and accessible to your users.

Would you like to share your experiences with LLM streaming? Have you faced similar challenges?

Let’s continue this conversation in the comments below.

SHARE

Read more

Success Stories

Driving Business Agility and Operational Excellence in Lease Management

Calfus successfully delivered the Oracle Lease Accounting module implementation, providing a streamlined, accurate, and fully..

Success Stories

Complete Redwood UI transformation in Oracle Fusion HCM

The HCM Redwood Migration project has been an outstanding success. Thanks to Calfus’s exceptional contribution..

Success Stories

Improved Operational Efficiency for Yamaichi Electronics USA

“Oracle Fusion Cloud applications provide our leadership team with visibility across our entire value chain,..

Success Stories

Driving Business Agility and Operational Excellence in Lease Management

Calfus successfully delivered the Oracle Lease Accounting module implementation, providing a streamlined, accurate, and fully..

Success Stories

Complete Redwood UI transformation in Oracle Fusion HCM

The HCM Redwood Migration project has been an outstanding success. Thanks to Calfus’s exceptional contribution..

Success Stories

Improved Operational Efficiency for Yamaichi Electronics USA

“Oracle Fusion Cloud applications provide our leadership team with visibility across our entire value chain,..

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful. See more details on our privacy policy page.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.