Ethereum: Problem with websocket output into dataframe with pandas

Here is an article about the WebSocket output problem in the Pandas data table with Binance:

Problem: An infinite loop of data output at Pandas Dataframe

As you successfully integrated your web connection with Binance on your script, it is crucial to get rid of another usual challenge that results from this integration. The question is the way data is collected and stored in the Pandas data table.

When using WebSocket’s APIs, such as Binance sites, each message received by the customer is usually stored as a separate element within the ‘data’ attribute the object that returned to the WebSocket connection. This can lead to exponential data growth in your panda data, resulting in an endless loop of data output.

Why is this happening?

On Binance’s Websockets, API messages are sent to pieces with a time mark and a messaging content. When you subscribe to various flows (for example, for the price of bitcoin and Steam volume), each flow receives its separate set of messages. Because the Websocket connection is working indefinitely, it will continue to receive new messages from each flow, creating a continuous loop.

Solution: Dealing with an endless data output with pandas

To avoid this endless data output and prevent script memory overflow, you can use multiple strategies:

1. Use dask

DASK is a parallel computer library that allows you to increase the calculation of large data sets without the need to use a complete cluster. By using DASK, you can divide a huge amount of data into smaller parts and process it in parallel, reducing memory use.

`Python

Import dask.dataframe as dd

Create an empty data box with 1000 lines (a reasonable size of a piece)

d = dd.from_pandas (pd.dataframe ({‘price’: np.Randa.rand (1000)}), npartition = 10)

Run calculating data in pieces of 100 lines at once

D.Pute ()

`

2. UseNUMPYExpressment

If you work with large binary data sets, consider using the numpy approach based on storage storage and manipulation.

Python

Import numpy as np

Import Bytesty

Create a list of empty data holding (such as Numple Euts)

Data = []

Process each data part in a loop

For and at the interval (1000):

Read 10000 Bytes with Websocket connection with the transfer area

Chunk = np.fambambuffer (b’chunk_data ‘ * 10, dtype = np.int32) .tobytes ()

Add a piece of list (such as a transfer area of ​​a nump)

Data.append (np.buffermanager (buffer = bytesium (chunk))))))

Combine the clippers in a data table

df = pd.concat (data)

Now you can make a calculation throughout the data set using a plate or pandas

`

3. Use the library to process flow data

There are libraries likeStarlette, which provide electrical data processing options for Binance's API Websockets.

Python

Of Starlette Import Web, httpview

Important Asincio

WebsocketProcessor Class (HTMLVIEW):

Async def Call (self, request):

Receive a message from a web link

Message = Wait Request.json ()

Process a message and store it on dataframe (using an effective processing dask)

df = dd.from_pandas (pd.dataframe ({‘content’: [message [‘data’]}), npartions = 10)

Run calculating data in parallel using dask

Result = awaits dask.compute (DF) .Cpute ()

Return web.json_pressponse (result)

Start the server to solve input requests

App = Web.Application ([WebsocketsProcessor])

Web.run_app (application, host = ‘0.0.0.0’, port = 8000)

``

Conclusion

In conclusion, the problem of an infinite data output in Binance’s API Panda Panda data table can be resolved using strategies such as DASK or using the numpy transfer area for effective processing and storage.

BITCOIN WORK QUESTIONS


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *