OpenAI’s SDK currently doesn’t support streaming for models GPT-3.5-Turbo or GPT-4.

Yes, very sad, anyway. I decided to DIY this shit.

Backend

On Node you can use the fetch api and get a ReadableStream of bytes as a response.

const openAIReadableTextStream = async (path: string, body: any) => {
    const response = await fetch(`https://api.openai.com/v1${path}`, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
            Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
        },
        body: JSON.stringify({
            ...body,
            stream: true,
        }),
    });
    if (!response.body) throw new Error('No response body.');

    return response.body.pipeThrough(new TextDecoderStream());
};

Here we use the fetch api to make a call to the OpenAI server and get a ReadableStream<UInt8Array> in response. It needs to be decoded into plaintext so we do that with pipeThrough. The OpenAI streaming endpoints return the response as an event stream.

The next step is to parse the event stream, and connect it to a Response stream. For parsing the event stream we can use eventsource-parser.

Installation: npm i eventsource-parser --save

export const getStreamingChatCompletion = async ({
    messages,
    writeStream,
}: {
    messages: ChatCompletionMessage[];
    writeStream: Response<any>;
}) => {
    function onParse(event: ParseEvent) {
        if (event.type === 'event') {
            if (event.data !== '[DONE]') {
              writeStream.write(JSON.parse(event.data).choices[0].delta?.content);
            }
        }
    }
    try {
        const response = await openAIReadableTextStream('/chat/completions', {
            model: 'gpt-4',
            messages,
        });
        const parser = createParser(onParse);
        // @ts-expect-error Node 16+ supports async iterables
        for await (const value of response) {
            parser.feed(value);
        }
        writeStream.end();
    } catch (error) {
        console.error(error);
        return 'Failed to get streaming completion.';
    }
};

We take the individual events and parse the data, which then gets written to the response stream. Once the end of the event stream is reached, we can end the response stream.

Now we can hook up the express endpoint with the chat completion stream.

app.get('/chatCompletion', async (req, res) => {
    const headers = {
        'Content-Type': 'text/event-stream',
        Connection: 'keep-alive',
        'Cache-Control': 'no-cache',
    };
    res.writeHead(200, headers);

    await getStreamingChatCompletion({
        // this is where the messages list goes
        messages,
        writeStream: res,
    });
});

Here we can see that the response stream is just a response object we get access to inside an express endpoint callback.

Frontend

This is the developer experience I was looking for:

const Component: FC = () => {
    const [streamingData, triggerQuery] = useStreamingQuery('/chatCompletion');
    return (
        <div>
            {streamingData}
            <button onClick={triggerQuery} />
        </div>
    );
};

I wrote a few hooks that abstract away all of the ReadableStream synchronization logic, and some nice-to-have data fetching wrappers.

useStreamingQuery Hook

This is one of the wrappers that get exposed from readable-hook.

Internally it uses useReadable, which takes a stream producer (the fetch API in case of the useStreamingQuery hook), and returns a query trigger and the streamed data.

Installation: npm i readable-hook --save

This is a simplified version of the hook. Check out readable-hook for more details.

const useStreamingQuery = (path: string): [string, () => void] => {
    const [data, setData] = useState('');
    const queryStream = useCallback(async () => {
        const response = await fetch(`${BASE_URL}${path}`);
        if (!response.body) throw new Error('No response body found.');

        const reader = response.getReader();

        async function syncWithTextStream() {
            const { value, done } = await reader.read();
            if (!done) {
                setData(value);
                requestAnimationFrame(() => {
                    syncWithTextStream();
                });
            }
        }

        syncWithTextStream();
    }, [path]);

    return [data, queryStream];
};

We setup intermediate state (on line 2), and the query function that handles fetching data from the stream periodically (on line 3). Both are then returned from the hook (on line 22). Even though the internals of the hook are fairly straightforward, the hook makes it much easier to re-use streaming data in other parts of the app.

Once the hook is initialized, we can read the values from streamingData, and update the UI. The hook takes care of all the heavy-lifting.

After all this hard work, we can finally have streaming response from the OpenAI chat completion apis.