Merge pull request #9 from rubin-dp0/tickets/PREOPS-551

MelissaGraham · web-flow · commit 359f5fdefa31 · 2021-06-10T16:19:28.000-07:00
Clarify some info in notebook 04.
diff --git a/04_Intro_to_Butler.ipynb b/04_Intro_to_Butler.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<img align=\"left\" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=250> \n",
+    "<img align=\"left\" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=250 style=\"padding: 10px\"> \n",
     "<b>Introduction to the LSST data Butler</b> <br>\n",
     "Last verified to run on <b>TBD</b> with LSST Science Pipelines release <b>TBD</b> <br>\n",
     "Contact author: Alex Drlica-Wagner <br>\n",
@@ -14,11 +14,10 @@
     "Questions welcome at <a href=\"https://community.lsst.org/c/support/dp0\">community.lsst.org/c/support/dp0</a> <br>\n",
     "Find DP0 documentation and resources at <a href=\"https://dp0-1.lsst.io\">dp0-1.lsst.io</a> <br>\n",
     "\n",
-    "<br>\n",
+    "<br><br>\n",
     "This notebook provides an introduction to the use of the data Butler. The Butler is the LSST Science Pipelines interface for managing, reading, and writing datasets. The Butler can be used to explore the contents of the DP0.1 data repository and access the DP0.1 data. The current version of the Butler (referred to as \"Gen-3\") is still under development, and this notebook may be modified in the future.\n",
     "\n",
     "<br>\n",
-    "<br>\n",
     "The goals of this notebook are to:<br>\n",
     "1. create an instance of the Butler<br>\n",
     "2. explore the DP0.1 data repository<br>\n",
@@ -292,7 +291,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "It looks like `band` is what we want, so we put it in the `where` argument of `queryDatasets`. Let's try the $g$ band."
+    "It looks like `band` is what we want, so we include that in the `dataId` argument of `queryDatasets`. We can also select only visits with visit numbers larger than 700000 by adding a constraint in the `where` argument of `queryDatasets`. Let's try the $g$ band."
    ]
   },
   {
@@ -344,7 +343,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The `calexp` is a calibrated CCD image from a single exposure. We'll use the afwDisplay interface to show the pixel values and mask plane (more on afwDisplay can be found in other notebooks)."
+    "The `calexp` (also known as a \"processed visit image,\" or PVI) is a calibrated CCD image from a single exposure. We'll use the afwDisplay interface to show the pixel values and mask plane (more on afwDisplay can be found in other notebooks)."
    ]
   },
   {
@@ -364,6 +363,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "Note the blue coloring of most sources in the above image (if you have sensitive eyes, you may also see some red streaks). The colors are set by the \"mask\" plane, which encodes things such as bad pixels, or ones that saturated (in this case, the blue pixels are those that are part of detected sources). See the Image Display and Manipulation tutorial for more about the mask plane. \n",
+    "\n",
     "Now that we have a calibrated image, we may want to get the catalog of sources that were extracted from it. To get the `src` catalog associated with this `calexp` we pass the `dataId` to the butler with the `src` datasetType. Note that this performs another query to the registry database to find the `src` catalog that matches our dataId requirements."
    ]
   },
@@ -401,6 +402,7 @@
     "afw_display.mtv(calexp)\n",
     "plt.gca().axis('off')\n",
     "\n",
+    "# We use display buffering to avoid re-drawing the image after each source is plotted\n",
     "with afw_display.Buffering():\n",
     "    for s in src:\n",
     "        afw_display.dot('+', s.getX(), s.getY(), ctype=afwDisplay.RED)\n",
@@ -413,7 +415,7 @@
    "source": [
     "### 4. How to query for multiple data sets\n",
     "\n",
-    "In the case above, both the `calexp` and `src` can be found by the registry, but this will not always necessarily be the case. The `queryDimensions` method provides a more flexible way to query for multiple datasets (requiring an instance of all datasets to be available for that dataId) or ask for different dataId keys than what is used to identify the dataset (which invokes various built-in relationships). An example of this is provided below:"
+    "In the case above, both the `calexp` and `src` can be found by the registry, but this will not always necessarily be the case. The `queryDataIds` method provides a more flexible way to query for multiple datasets (requiring an instance of all datasets to be available for that dataId) or ask for different dataId keys than what is used to identify the dataset (which invokes various built-in relationships). An example of this is provided below:"
    ]
   },
   {
@@ -422,7 +424,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Use queryDimensions to provide more flexible access\n",
+    "# Use queryDataIds to grab the dataIds for a subset taken from a single visit\n",
     "dataIds = registry.queryDataIds([\"visit\", \"detector\", \"band\"], datasets=[\"calexp\",\"src\"], where='visit = 703697',\n",
     "                                collections=collection)\n",
     "for i,dataId in enumerate(dataIds):\n",
@@ -436,7 +438,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Use queryDataIds to grab the dataIds for a subset\n",
+    "# Use queryDataIds to grab the dataIds for a subset using the \"where\" functionality\n",
     "dataIds = registry.queryDataIds([\"visit\", \"detector\"], datasets=[\"calexp\",\"src\"], \n",
     "                                where=\"band='g' and detector=0 and visit > 700000\",\n",
     "                                collections=collection)\n",
@@ -458,6 +460,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "# Use queryDimensions to provide more flexible access\n",
     "for dim in ['exposure','visit','detector']:\n",
     "    print(list(registry.queryDimensionRecords(dim, where='visit = 971990 and detector=0'))[0])\n",
     "    print()"
@@ -764,6 +767,13 @@
     "plt.title('Butler coadd_forced_src objects in tract 4638 patch 43')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As a final note -- did you notice that the RA and Dec columns (`coord_ra` and `coord_dec`, specifically) have units of _radians_? As an exercise, you could use what you've learned from above to confirm this by accessing the table schema. (Also note that you can scroll up and find the answer in the outputs from a cell you already executed.) "
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,