Skip to content
This repository was archived by the owner on Apr 28, 2025. It is now read-only.

Commit d688050

Browse files
authored
Merge pull request #33 from rubin-dp0/tickets/PREOPS-620
Simple histogram to bin categorical data
2 parents 06c9a6b + 1d702bd commit d688050

1 file changed

Lines changed: 45 additions & 2 deletions

File tree

02_Intermediate_TAP_Query.ipynb

Lines changed: 45 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -428,6 +428,50 @@
428428
"print(f'There are {results[results[\"truth_type\"] == 3].shape[0]} SNe')"
429429
]
430430
},
431+
{
432+
"cell_type": "markdown",
433+
"metadata": {},
434+
"source": [
435+
"#### 2.4. Simple histogram to bin categorical data. \n",
436+
"\n",
437+
"Now we will create a simple categorical histogram of the number of each truth_type in the dataset. We will use the 'GROUP BY' ADQL command to group the Objects in the truth_match catalog by type (1: galaxies, 2:stars, 3: SNe), and the 'COUNT' command to count the number of Objects in each category. Finally we will use the 'ORDER' command to order the results by ascending order of truth_type. "
438+
]
439+
},
440+
{
441+
"cell_type": "code",
442+
"execution_count": null,
443+
"metadata": {},
444+
"outputs": [],
445+
"source": [
446+
"query_histogram = \"SELECT truth_type, count(truth_type) \" \\\n",
447+
" \" FROM dp01_dc2_catalogs.truth_match \" \\\n",
448+
" \" GROUP BY truth_type \" \\\n",
449+
" \" ORDER BY truth_type\"\n",
450+
"print(query_histogram)"
451+
]
452+
},
453+
{
454+
"cell_type": "code",
455+
"execution_count": null,
456+
"metadata": {},
457+
"outputs": [],
458+
"source": [
459+
"object_type_histogram = service.search(query_histogram).to_table().to_pandas()"
460+
]
461+
},
462+
{
463+
"cell_type": "code",
464+
"execution_count": null,
465+
"metadata": {},
466+
"outputs": [],
467+
"source": [
468+
"# Map the numerical values for each type to a more descriptive name\n",
469+
"object_map = {1: 'galaxy', 2: 'star', 3: 'SNe'}\n",
470+
"object_type_histogram['truth_type'] = \\\n",
471+
" object_type_histogram['truth_type'].map(object_map)\n",
472+
"object_type_histogram"
473+
]
474+
},
431475
{
432476
"cell_type": "markdown",
433477
"metadata": {},
@@ -474,7 +518,6 @@
474518
"\n",
475519
"# We will want to filter on the truth type later\n",
476520
"# We will convert the truth_type integer to a more descriptive string\n",
477-
"object_map = {1: 'galaxy', 2: 'star', 3: 'SNe'}\n",
478521
"source.data['truth_type'] = results['truth_type'].map(object_map)"
479522
]
480523
},
@@ -784,7 +827,7 @@
784827
"\n",
785828
"# Assert that the results are the same as obtained from\n",
786829
"# executing synchronous queries\n",
787-
"assert len(async_results) == 14424\n",
830+
"assert len(async_results) == 14424 \n",
788831
"assert_frame_equal(results, async_results.to_table().to_pandas())"
789832
]
790833
},

0 commit comments

Comments
 (0)