Skip to content

Commit cf726e4

Browse files
committed
Fixed #240 Refactored, improved aampi, stumpi performance
1 parent 7ecd8cb commit cf726e4

3 files changed

Lines changed: 111 additions & 125 deletions

File tree

docs/Tutorial_Matrix_Profiles_For_Streaming_Data.ipynb

Lines changed: 75 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -254,8 +254,8 @@
254254
"name": "stdout",
255255
"output_type": "stream",
256256
"text": [
257-
"stumpy.stump: 1022.7s\n",
258-
"stumpy.stumpi: 172.0s\n"
257+
"stumpy.stump: 1036.3s\n",
258+
"stumpy.stumpi: 21.6s\n"
259259
]
260260
}
261261
],
@@ -264,7 +264,7 @@
264264
"T_stream = T_full[:200].copy()\n",
265265
"m = 100\n",
266266
"\n",
267-
"# # `stumpy.stump` timing\n",
267+
"# `stumpy.stump` timing\n",
268268
"start = time.time()\n",
269269
"mp = stumpy.stump(T_stream, m)\n",
270270
"for i in range(200, len(T_full)):\n",
@@ -273,8 +273,8 @@
273273
"stump_time = time.time() - start\n",
274274
"\n",
275275
"# `stumpy.stumpi` timing\n",
276-
"start = time.time()\n",
277276
"stream = stumpy.stumpi(T_stream, m, egress=False) # Don't egress/remove the oldest data point when streaming\n",
277+
"start = time.time()\n",
278278
"for i in range(200, len(T_full)):\n",
279279
" t = T_full[i]\n",
280280
" stream.update(t)\n",
@@ -288,28 +288,44 @@
288288
"cell_type": "markdown",
289289
"metadata": {},
290290
"source": [
291-
"Setting aside the fact that having more CPUs will speed up both approaches, we clearly see that incremental `stumpy.stumpi` is almost an order of magnitude faster than batch `stumpy.stump` for processing streaming data. In fact for the current hardware, on average, it is taking roughly 0.1 seconds for `stumpy.stump` to analyze each new matrix profile. So, if you have more than 10 new data point arriving every second, then you wouldn't be able to keep up. In contrast, `stumpy.stumpi` should be able to comfortably handle and process ~50+ new data points per second using fairly modest hardware. Additionally, batch `stumpy.stump`, which has a computational complexity of `O(n^2)`, will get even slower as more and more data points get appended to the existing time series while `stumpy.stumpi`, which is essentially `O(1)`, will continue to be highly performant. \n",
291+
"Setting aside the fact that having more CPUs will speed up both approaches, we clearly see that incremental `stumpy.stumpi` is one to two orders of magnitude faster than batch `stumpy.stump` for processing streaming data. In fact for the current hardware, on average, it is taking roughly 0.1 seconds for `stumpy.stump` to analyze each new matrix profile. So, if you have more than 10 new data point arriving every second, then you wouldn't be able to keep up. In contrast, `stumpy.stumpi` should be able to comfortably handle and process ~450+ new data points per second using fairly modest hardware. Additionally, batch `stumpy.stump`, which has a computational complexity of `O(n^2)`, will get even slower as more and more data points get appended to the existing time series while `stumpy.stumpi`, which is essentially `O(1)`, will continue to be highly performant. \n",
292292
"\n",
293-
"In fact, if you <u><b>don't</b></u> care about maintaining the oldest data point and its relationships with the newest data point (i.e., you only care about maintaining a fixed sized sliding window), then you can get even better performance by telling `stumpy.stumpi` to remove/egress the oldest data point (along with its corresponding matrix profile information) by setting the parameter `egress=True` (note that this is actually the default behavior):"
293+
"In fact, if you <u><b>don't</b></u> care about maintaining the oldest data point and its relationships with the newest data point (i.e., you only care about maintaining a fixed sized sliding window), then you can get slightly improve the performance by telling `stumpy.stumpi` to remove/egress the oldest data point (along with its corresponding matrix profile information) by setting the parameter `egress=True` when we instantiate our streaming object (note that this is actually the default behavior):"
294+
]
295+
},
296+
{
297+
"cell_type": "code",
298+
"execution_count": 16,
299+
"metadata": {},
300+
"outputs": [],
301+
"source": [
302+
"stream = stumpy.stumpi(T_stream, m, egress=True) # Egressing/removing the oldest data point is the default behavior!"
303+
]
304+
},
305+
{
306+
"cell_type": "markdown",
307+
"metadata": {},
308+
"source": [
309+
"And now, when we process the same data above:"
294310
]
295311
},
296312
{
297313
"cell_type": "code",
298-
"execution_count": 13,
314+
"execution_count": 17,
299315
"metadata": {},
300316
"outputs": [
301317
{
302318
"name": "stdout",
303319
"output_type": "stream",
304320
"text": [
305-
"stumpy.stumpi: 125.6s\n"
321+
"stumpy.stumpi: 13.3s\n"
306322
]
307323
}
308324
],
309325
"source": [
310-
"# `stumpy.stumpi` timing\n",
326+
"# `stumpy.stumpi` timing with egress\n",
327+
"stream = stumpy.stumpi(T_stream, m, egress=True)\n",
311328
"start = time.time()\n",
312-
"stream = stumpy.stumpi(T_stream, m, egress=True) # This is actually the default behavior in `stumpy.stumpi`\n",
313329
"for i in range(200, len(T_full)):\n",
314330
" t = T_full[i]\n",
315331
" stream.update(t)\n",
@@ -318,13 +334,6 @@
318334
"print(f\"stumpy.stumpi: {np.round(stumpi_time, 1)}s\")"
319335
]
320336
},
321-
{
322-
"cell_type": "markdown",
323-
"metadata": {},
324-
"source": [
325-
"Now, by egressing the oldest data point in our time series, we can handle a data stream that sees ~80 data points arriving every second!"
326-
]
327-
},
328337
{
329338
"cell_type": "markdown",
330339
"metadata": {},
@@ -345,7 +354,7 @@
345354
},
346355
{
347356
"cell_type": "code",
348-
"execution_count": 14,
357+
"execution_count": 18,
349358
"metadata": {},
350359
"outputs": [
351360
{
@@ -424,7 +433,7 @@
424433
"4 326.65276 0.285649 0.753776 13.681666"
425434
]
426435
},
427-
"execution_count": 14,
436+
"execution_count": 18,
428437
"metadata": {},
429438
"output_type": "execute_result"
430439
}
@@ -470,16 +479,16 @@
470479
},
471480
{
472481
"cell_type": "code",
473-
"execution_count": 15,
482+
"execution_count": 19,
474483
"metadata": {},
475484
"outputs": [
476485
{
477486
"data": {
478487
"text/plain": [
479-
"[<matplotlib.lines.Line2D at 0x7fb1dfa7d390>]"
488+
"[<matplotlib.lines.Line2D at 0x7fbbed162ed0>]"
480489
]
481490
},
482-
"execution_count": 15,
491+
"execution_count": 19,
483492
"metadata": {},
484493
"output_type": "execute_result"
485494
},
@@ -534,7 +543,7 @@
534543
},
535544
{
536545
"cell_type": "code",
537-
"execution_count": 16,
546+
"execution_count": 20,
538547
"metadata": {},
539548
"outputs": [],
540549
"source": [
@@ -552,7 +561,7 @@
552561
},
553562
{
554563
"cell_type": "code",
555-
"execution_count": 17,
564+
"execution_count": 21,
556565
"metadata": {},
557566
"outputs": [],
558567
"source": [
@@ -577,7 +586,7 @@
577586
},
578587
{
579588
"cell_type": "code",
580-
"execution_count": 18,
589+
"execution_count": 22,
581590
"metadata": {},
582591
"outputs": [
583592
{
@@ -768,42 +777,42 @@
768777
"</style>\n",
769778
"\n",
770779
"<div class=\"animation\">\n",
771-
" <img id=\"_anim_imgedabb6cec1704a6d87bd3e03e786884e\">\n",
780+
" <img id=\"_anim_imgb37864aad89e4299bb82476aa20efc66\">\n",
772781
" <div class=\"anim-controls\">\n",
773-
" <input id=\"_anim_slideredabb6cec1704a6d87bd3e03e786884e\" type=\"range\" class=\"anim-slider\"\n",
782+
" <input id=\"_anim_sliderb37864aad89e4299bb82476aa20efc66\" type=\"range\" class=\"anim-slider\"\n",
774783
" name=\"points\" min=\"0\" max=\"1\" step=\"1\" value=\"0\"\n",
775-
" oninput=\"animedabb6cec1704a6d87bd3e03e786884e.set_frame(parseInt(this.value));\"></input>\n",
784+
" oninput=\"animb37864aad89e4299bb82476aa20efc66.set_frame(parseInt(this.value));\"></input>\n",
776785
" <div class=\"anim-buttons\">\n",
777-
" <button title=\"Decrease speed\" onclick=\"animedabb6cec1704a6d87bd3e03e786884e.slower()\">\n",
786+
" <button title=\"Decrease speed\" onclick=\"animb37864aad89e4299bb82476aa20efc66.slower()\">\n",
778787
" <i class=\"fa fa-minus\"></i></button>\n",
779-
" <button title=\"First frame\" onclick=\"animedabb6cec1704a6d87bd3e03e786884e.first_frame()\">\n",
788+
" <button title=\"First frame\" onclick=\"animb37864aad89e4299bb82476aa20efc66.first_frame()\">\n",
780789
" <i class=\"fa fa-fast-backward\"></i></button>\n",
781-
" <button title=\"Previous frame\" onclick=\"animedabb6cec1704a6d87bd3e03e786884e.previous_frame()\">\n",
790+
" <button title=\"Previous frame\" onclick=\"animb37864aad89e4299bb82476aa20efc66.previous_frame()\">\n",
782791
" <i class=\"fa fa-step-backward\"></i></button>\n",
783-
" <button title=\"Play backwards\" onclick=\"animedabb6cec1704a6d87bd3e03e786884e.reverse_animation()\">\n",
792+
" <button title=\"Play backwards\" onclick=\"animb37864aad89e4299bb82476aa20efc66.reverse_animation()\">\n",
784793
" <i class=\"fa fa-play fa-flip-horizontal\"></i></button>\n",
785-
" <button title=\"Pause\" onclick=\"animedabb6cec1704a6d87bd3e03e786884e.pause_animation()\">\n",
794+
" <button title=\"Pause\" onclick=\"animb37864aad89e4299bb82476aa20efc66.pause_animation()\">\n",
786795
" <i class=\"fa fa-pause\"></i></button>\n",
787-
" <button title=\"Play\" onclick=\"animedabb6cec1704a6d87bd3e03e786884e.play_animation()\">\n",
796+
" <button title=\"Play\" onclick=\"animb37864aad89e4299bb82476aa20efc66.play_animation()\">\n",
788797
" <i class=\"fa fa-play\"></i></button>\n",
789-
" <button title=\"Next frame\" onclick=\"animedabb6cec1704a6d87bd3e03e786884e.next_frame()\">\n",
798+
" <button title=\"Next frame\" onclick=\"animb37864aad89e4299bb82476aa20efc66.next_frame()\">\n",
790799
" <i class=\"fa fa-step-forward\"></i></button>\n",
791-
" <button title=\"Last frame\" onclick=\"animedabb6cec1704a6d87bd3e03e786884e.last_frame()\">\n",
800+
" <button title=\"Last frame\" onclick=\"animb37864aad89e4299bb82476aa20efc66.last_frame()\">\n",
792801
" <i class=\"fa fa-fast-forward\"></i></button>\n",
793-
" <button title=\"Increase speed\" onclick=\"animedabb6cec1704a6d87bd3e03e786884e.faster()\">\n",
802+
" <button title=\"Increase speed\" onclick=\"animb37864aad89e4299bb82476aa20efc66.faster()\">\n",
794803
" <i class=\"fa fa-plus\"></i></button>\n",
795804
" </div>\n",
796-
" <form title=\"Repetition mode\" action=\"#n\" name=\"_anim_loop_selectedabb6cec1704a6d87bd3e03e786884e\"\n",
805+
" <form title=\"Repetition mode\" action=\"#n\" name=\"_anim_loop_selectb37864aad89e4299bb82476aa20efc66\"\n",
797806
" class=\"anim-state\">\n",
798-
" <input type=\"radio\" name=\"state\" value=\"once\" id=\"_anim_radio1_edabb6cec1704a6d87bd3e03e786884e\"\n",
807+
" <input type=\"radio\" name=\"state\" value=\"once\" id=\"_anim_radio1_b37864aad89e4299bb82476aa20efc66\"\n",
799808
" checked>\n",
800-
" <label for=\"_anim_radio1_edabb6cec1704a6d87bd3e03e786884e\">Once</label>\n",
801-
" <input type=\"radio\" name=\"state\" value=\"loop\" id=\"_anim_radio2_edabb6cec1704a6d87bd3e03e786884e\"\n",
809+
" <label for=\"_anim_radio1_b37864aad89e4299bb82476aa20efc66\">Once</label>\n",
810+
" <input type=\"radio\" name=\"state\" value=\"loop\" id=\"_anim_radio2_b37864aad89e4299bb82476aa20efc66\"\n",
802811
" >\n",
803-
" <label for=\"_anim_radio2_edabb6cec1704a6d87bd3e03e786884e\">Loop</label>\n",
804-
" <input type=\"radio\" name=\"state\" value=\"reflect\" id=\"_anim_radio3_edabb6cec1704a6d87bd3e03e786884e\"\n",
812+
" <label for=\"_anim_radio2_b37864aad89e4299bb82476aa20efc66\">Loop</label>\n",
813+
" <input type=\"radio\" name=\"state\" value=\"reflect\" id=\"_anim_radio3_b37864aad89e4299bb82476aa20efc66\"\n",
805814
" >\n",
806-
" <label for=\"_anim_radio3_edabb6cec1704a6d87bd3e03e786884e\">Reflect</label>\n",
815+
" <label for=\"_anim_radio3_b37864aad89e4299bb82476aa20efc66\">Reflect</label>\n",
807816
" </form>\n",
808817
" </div>\n",
809818
"</div>\n",
@@ -813,9 +822,9 @@
813822
" /* Instantiate the Animation class. */\n",
814823
" /* The IDs given should match those used in the template above. */\n",
815824
" (function() {\n",
816-
" var img_id = \"_anim_imgedabb6cec1704a6d87bd3e03e786884e\";\n",
817-
" var slider_id = \"_anim_slideredabb6cec1704a6d87bd3e03e786884e\";\n",
818-
" var loop_select_id = \"_anim_loop_selectedabb6cec1704a6d87bd3e03e786884e\";\n",
825+
" var img_id = \"_anim_imgb37864aad89e4299bb82476aa20efc66\";\n",
826+
" var slider_id = \"_anim_sliderb37864aad89e4299bb82476aa20efc66\";\n",
827+
" var loop_select_id = \"_anim_loop_selectb37864aad89e4299bb82476aa20efc66\";\n",
819828
" var frames = new Array(153);\n",
820829
" \n",
821830
" frames[0] = \"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABaAAAAGwCAYAAABfOcJAAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90\\\n",
@@ -166302,7 +166311,7 @@
166302166311
" /* set a timeout to make sure all the above elements are created before\n",
166303166312
" the object is initialized. */\n",
166304166313
" setTimeout(function() {\n",
166305-
" animedabb6cec1704a6d87bd3e03e786884e = new Animation(frames, img_id, slider_id, 100.0,\n",
166314+
" animb37864aad89e4299bb82476aa20efc66 = new Animation(frames, img_id, slider_id, 100.0,\n",
166306166315
" loop_select_id);\n",
166307166316
" }, 0);\n",
166308166317
" })()\n",
@@ -166312,7 +166321,7 @@
166312166321
"<IPython.core.display.HTML object>"
166313166322
]
166314166323
},
166315-
"execution_count": 18,
166324+
"execution_count": 22,
166316166325
"metadata": {},
166317166326
"output_type": "execute_result"
166318166327
}
@@ -166389,29 +166398,29 @@
166389166398
},
166390166399
{
166391166400
"cell_type": "code",
166392-
"execution_count": 19,
166401+
"execution_count": 23,
166393166402
"metadata": {},
166394166403
"outputs": [
166395166404
{
166396166405
"name": "stdout",
166397166406
"output_type": "stream",
166398166407
"text": [
166399-
"Full Matrix Profile: [1.22 1.51 1.49 1.78 1.93 1.57 1.94 1.77 1.76 2.26 2.21 2.28 1.79 1.87\n",
166400-
" 1.6 1.86 1.97 1.43 1.11 1.49 1.94 2.67 2.41 1.79 1.87 1.87 1.57 1.94\n",
166401-
" 1.77 1.76 1.87 2.2 2.5 1.65 2.32 2.43 1.91 2.18 1.22 1.11 2.19 1.6\n",
166402-
" 2.03 2.65 1.65 2.05 1.91 2.1 2.22 1.62 0.88 1.78 1.93 2.05 1.62 0.88\n",
166403-
" 2.24]\n",
166404-
"Left Matrix Profile: [ inf inf inf 4.5 3.89 3.65 3.54 3.25 3.03 2.85 3.18 2.82 2.78 2.85\n",
166405-
" 2.73 2.21 2.28 1.43 1.51 1.49 1.94 2.74 2.88 1.79 1.87 1.95 1.57 1.94\n",
166406-
" 1.77 1.76 1.87 2.2 2.5 2.43 2.32 2.47 2.17 2.18 1.22 1.11 2.19 1.6\n",
166407-
" 2.03 2.65 1.65 2.34 1.91 2.1 2.22 2.16 2.16 1.78 1.93 2.05 1.62 0.88\n",
166408-
" 2.24]\n",
166409-
"Full Matrix Profile Indices: [38 18 19 51 52 26 27 28 29 30 15 16 23 24 41 26 27 0 39 2 3 48 44 12\n",
166410-
" 13 30 5 6 7 8 25 24 11 44 28 49 46 16 0 18 13 14 25 33 33 53 36 27\n",
166411-
" 0 54 55 3 4 45 49 50 3]\n",
166412-
"Left Matrix Profile Indices: [-1 -1 -1 0 0 0 3 4 5 6 3 4 7 8 9 10 11 0 1 2 3 0 1 12\n",
166413-
" 13 4 5 6 7 8 25 24 11 12 28 29 15 16 0 18 13 14 25 33 33 33 36 27\n",
166414-
" 0 1 2 3 4 45 49 50 3]\n"
166408+
"Full Matrix Profile: [2. 1.73 1.8 2.14 2.21 2.12 2.07 2.12 2.14 1.9 2.17 1.43 2.07 2.04\n",
166409+
" 1.88 1.91 1.8 2.14 2.21 2.12 1.43 2.09 2.15 2.17 1.84 2.22 1.25 1.39\n",
166410+
" 1.73 1.97 2.12 2.07 2.2 2.12 2.16 1.78 2.04 1.88 1.91 2.29 2.15 1.25\n",
166411+
" 1.39 1.9 1.65 2.08 2.31 2.41 2.17 2.09 2.15 2.12 1.65 1.78 2.16 1.99\n",
166412+
" 1.85]\n",
166413+
"Left Matrix Profile: [ inf inf inf 4.32 4.07 4.19 4.7 2.85 2.44 3.64 3.1 3.11 2.07 2.12\n",
166414+
" 2.04 2.22 1.8 2.14 2.21 2.12 1.43 2.7 3.05 2.58 2.34 2.27 2.23 2.\n",
166415+
" 1.73 2.41 2.42 2.07 2.2 2.29 2.23 2.42 2.04 1.88 1.91 2.29 2.15 1.25\n",
166416+
" 1.39 1.9 1.84 2.2 2.31 2.41 2.17 2.09 2.15 2.12 1.65 1.78 2.16 1.99\n",
166417+
" 1.85]\n",
166418+
"Full Matrix Profile Indices: [27 28 16 17 18 19 12 13 14 43 48 20 6 36 37 38 2 3 4 5 11 49 50 51\n",
166419+
" 44 40 41 42 1 37 38 16 5 51 52 53 13 14 15 5 6 26 27 9 52 56 36 19\n",
166420+
" 10 21 22 33 44 35 51 44 53]\n",
166421+
"Left Matrix Profile Indices: [-1 -1 -1 0 1 2 1 0 0 1 6 7 6 7 0 1 2 3 4 5 11 0 1 10\n",
166422+
" 11 12 6 0 1 9 3 16 5 10 11 12 13 14 15 5 6 26 27 9 24 21 36 19\n",
166423+
" 10 21 22 33 44 35 51 44 53]\n"
166415166424
]
166416166425
}
166417166426
],

0 commit comments

Comments
 (0)