You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/Tutorial_Matrix_Profiles_For_Streaming_Data.ipynb
+75-66Lines changed: 75 additions & 66 deletions
Original file line number
Diff line number
Diff line change
@@ -254,8 +254,8 @@
254
254
"name": "stdout",
255
255
"output_type": "stream",
256
256
"text": [
257
-
"stumpy.stump: 1022.7s\n",
258
-
"stumpy.stumpi: 172.0s\n"
257
+
"stumpy.stump: 1036.3s\n",
258
+
"stumpy.stumpi: 21.6s\n"
259
259
]
260
260
}
261
261
],
@@ -264,7 +264,7 @@
264
264
"T_stream = T_full[:200].copy()\n",
265
265
"m = 100\n",
266
266
"\n",
267
-
"# # `stumpy.stump` timing\n",
267
+
"# `stumpy.stump` timing\n",
268
268
"start = time.time()\n",
269
269
"mp = stumpy.stump(T_stream, m)\n",
270
270
"for i in range(200, len(T_full)):\n",
@@ -273,8 +273,8 @@
273
273
"stump_time = time.time() - start\n",
274
274
"\n",
275
275
"# `stumpy.stumpi` timing\n",
276
-
"start = time.time()\n",
277
276
"stream = stumpy.stumpi(T_stream, m, egress=False) # Don't egress/remove the oldest data point when streaming\n",
277
+
"start = time.time()\n",
278
278
"for i in range(200, len(T_full)):\n",
279
279
" t = T_full[i]\n",
280
280
" stream.update(t)\n",
@@ -288,28 +288,44 @@
288
288
"cell_type": "markdown",
289
289
"metadata": {},
290
290
"source": [
291
-
"Setting aside the fact that having more CPUs will speed up both approaches, we clearly see that incremental `stumpy.stumpi` is almost an order of magnitude faster than batch `stumpy.stump` for processing streaming data. In fact for the current hardware, on average, it is taking roughly 0.1 seconds for `stumpy.stump` to analyze each new matrix profile. So, if you have more than 10 new data point arriving every second, then you wouldn't be able to keep up. In contrast, `stumpy.stumpi` should be able to comfortably handle and process ~50+ new data points per second using fairly modest hardware. Additionally, batch `stumpy.stump`, which has a computational complexity of `O(n^2)`, will get even slower as more and more data points get appended to the existing time series while `stumpy.stumpi`, which is essentially `O(1)`, will continue to be highly performant. \n",
291
+
"Setting aside the fact that having more CPUs will speed up both approaches, we clearly see that incremental `stumpy.stumpi` is one to two orders of magnitude faster than batch `stumpy.stump` for processing streaming data. In fact for the current hardware, on average, it is taking roughly 0.1 seconds for `stumpy.stump` to analyze each new matrix profile. So, if you have more than 10 new data point arriving every second, then you wouldn't be able to keep up. In contrast, `stumpy.stumpi` should be able to comfortably handle and process ~450+ new data points per second using fairly modest hardware. Additionally, batch `stumpy.stump`, which has a computational complexity of `O(n^2)`, will get even slower as more and more data points get appended to the existing time series while `stumpy.stumpi`, which is essentially `O(1)`, will continue to be highly performant. \n",
292
292
"\n",
293
-
"In fact, if you <u><b>don't</b></u> care about maintaining the oldest data point and its relationships with the newest data point (i.e., you only care about maintaining a fixed sized sliding window), then you can get even better performance by telling `stumpy.stumpi` to remove/egress the oldest data point (along with its corresponding matrix profile information) by setting the parameter `egress=True` (note that this is actually the default behavior):"
293
+
"In fact, if you <u><b>don't</b></u> care about maintaining the oldest data point and its relationships with the newest data point (i.e., you only care about maintaining a fixed sized sliding window), then you can get slightly improve the performance by telling `stumpy.stumpi` to remove/egress the oldest data point (along with its corresponding matrix profile information) by setting the parameter `egress=True` when we instantiate our streaming object (note that this is actually the default behavior):"
294
+
]
295
+
},
296
+
{
297
+
"cell_type": "code",
298
+
"execution_count": 16,
299
+
"metadata": {},
300
+
"outputs": [],
301
+
"source": [
302
+
"stream = stumpy.stumpi(T_stream, m, egress=True) # Egressing/removing the oldest data point is the default behavior!"
303
+
]
304
+
},
305
+
{
306
+
"cell_type": "markdown",
307
+
"metadata": {},
308
+
"source": [
309
+
"And now, when we process the same data above:"
294
310
]
295
311
},
296
312
{
297
313
"cell_type": "code",
298
-
"execution_count": 13,
314
+
"execution_count": 17,
299
315
"metadata": {},
300
316
"outputs": [
301
317
{
302
318
"name": "stdout",
303
319
"output_type": "stream",
304
320
"text": [
305
-
"stumpy.stumpi: 125.6s\n"
321
+
"stumpy.stumpi: 13.3s\n"
306
322
]
307
323
}
308
324
],
309
325
"source": [
310
-
"# `stumpy.stumpi` timing\n",
326
+
"# `stumpy.stumpi` timing with egress\n",
327
+
"stream = stumpy.stumpi(T_stream, m, egress=True)\n",
311
328
"start = time.time()\n",
312
-
"stream = stumpy.stumpi(T_stream, m, egress=True) # This is actually the default behavior in `stumpy.stumpi`\n",
0 commit comments