Merge pull request #7 from rubin-dp0/tickets/PREOPS-532

MelissaGraham · web-flow · commit dfc0d09d2f08 · 2021-06-08T17:57:26.000-07:00
added Section 6 for Butler catalogs
diff --git a/04_Intro_to_Butler.ipynb b/04_Intro_to_Butler.ipynb
@@ -8,23 +8,25 @@
     "<b>Introduction to the LSST data Butler</b> <br>\n",
     "Last verified to run on <b>TBD</b> with LSST Science Pipelines release <b>TBD</b> <br>\n",
     "Contact author: Alex Drlica-Wagner <br>\n",
-    "Credit: Originally developed in the context of the LSST Stack Club <br>\n",
+    "Credit: Originally developed by Alex Drlica-Wagner in the context of the LSST Stack Club <br>\n",
     "Target audience: All DP0 delegates. <br>\n",
     "Container Size: medium <br>\n",
-    "<br>\n",
     "Questions welcome at <a href=\"https://community.lsst.org/c/support/dp0\">community.lsst.org/c/support/dp0</a> <br>\n",
     "Find DP0 documentation and resources at <a href=\"https://dp0-1.lsst.io\">dp0-1.lsst.io</a> <br>\n",
     "\n",
+    "<br>\n",
     "This notebook provides an introduction to the use of the data Butler. The Butler is the LSST Science Pipelines interface for managing, reading, and writing datasets. The Butler can be used to explore the contents of the DP0.1 data repository and access the DP0.1 data. The current version of the Butler (referred to as \"Gen-3\") is still under development, and this notebook may be modified in the future.\n",
     "\n",
-    "The goals of this notebook are:\n",
+    "<br>\n",
+    "<br>\n",
+    "The goals of this notebook are to:<br>\n",
+    "1. create an instance of the Butler<br>\n",
+    "2. explore the DP0.1 data repository<br>\n",
+    "3. retrive and display some image and catalog data<br>\n",
+    "4. create in image cutout at a user-specified coordinate<br>\n",
+    "5. retrieve and plot catalog data \n",
     "\n",
-    "1. Create a Butler.\n",
-    "2. Programmatically explore the content of the DP0.1 data repository\n",
-    "3. Grab some DP0.1 data! \n",
-    "4. Grab a cutout of the DP0.1 coadds at a specific (RA,Dec).\n",
     "\n",
-    "**Credit:** This notebook was originally developed by Alex Drlica-Wagner in the context of the LSST Stack Club.\n",
     "\n",
     "### Setup"
    ]
@@ -64,7 +66,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We import several packages from the LSST Science Pipelines. The first import gives us access to the Butler, while the second provides tools for displaying data. More details on data display can be found in [03_Intro_to_AFW_Display.ipynb](03_Intro_to_AFW_Display.ipynb)."
+    "We import several packages from the LSST Science Pipelines. \n",
+    "The first import gives us access to the Butler, while the second provides tools for displaying data.\n",
+    "\n",
+    "More details and techniques regarding image display can be found in the `rubin-dp0` GitHub Organization's [tutorial-notebooks](https://github.com/rubin-dp0/tutorial-notebooks) repository."
    ]
   },
   {
@@ -140,10 +145,11 @@
    "source": [
     "This is our first glimpse at the data contained in the repository, but it doesn't teach us *which* collection we are actually interested in. The names do give us some hints though...\n",
     "\n",
+    "* `2.2i` - refers to the processing run of the LSST DESC DC2 data (the `i` stands for `imSim`)\n",
     "* `calib` - refers to calibration products that are used for instrument signature removal\n",
+    "* `runs` - refers to processed data products\n",
     "* `refcats` - refers to the reference catalogs used for astrometric and photometric calibration\n",
     "* `skymaps` - are the geometric representations of the sky coverage\n",
-    "* `2.2i` - refers to the processing run of the LSST DESC DC2 data (the `i` stands for `imSim`)\n",
     "\n",
     "Collections can be nested, so we can access to everything for DC2 Run 2.2i (the primary DP0.1 data set) by selecting the collection `2.2i/runs/DP0.1`. This is a pointer to other collections that expand out recursively... More on collections can be found here: https://pipelines.lsst.io/v/weekly/modules/lsst.daf.butler/organizing.html#collections"
    ]
@@ -205,7 +211,15 @@
     "- `calexp` - Refers to individual calibrated exposure\n",
     "- `deepCoadd` - Refers to products produced on the coadd images (both images and and source catalogs)\n",
     "- `src` - refers the the catalog of sources\n",
-    "- `skyMap` - refers to geometric representations of the sky coverage"
+    "- `skyMap` - refers to geometric representations of the sky coverage\n",
+    "\n",
+    "<b> Which data sets are most appropriate for DP0.1? </b><br>\n",
+    "Most DP0.1 delegates will only be interested in data sets with types `ExposureF` or `SourceCatalog`. \n",
+    "For images, stick to the `calexp` (processed visit images, or PVIs) and `deepCoadd` (stacked PVIs).\n",
+    "For catalogs, the `src` should be used with the `calexp` images, and the `deepCoadd_forced_src` are the most appropriate to be used with the `coadds`.\n",
+    "More information can be found in the DP0.1 Data Products Definitions Document (DPDD) at [dp0-1.lsst.io](http://dp0-1.lsst.io).\n",
+    "\n",
+    "<br>\n"
    ]
   },
   {
@@ -233,7 +247,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can get the path from the URI that is returned by `butler.getURI`. Note that this URI does not have to refer to a local path on the filesystem."
+    "We can get the path from the URI that is returned by `butler.getURI`. Note that this URI does not refer to a local path on the filesystem. We do not need to know exactly where the data live in order to access it. That's the power of the Butler."
    ]
   },
   {
@@ -300,7 +314,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3. Get some data\n",
+    "### 3. Retrieve and plot a calexp with sources\n",
     "\n",
     "Ok, now we have all the information we need to ask the Butler to get a specific data product. We have identified a collection (`2.2i/runs/DP0.1`), a `datasetType` (`calexp`), and the `dataId` (from the `datasetRef`) to uniquely specify an instance of this data set."
    ]
@@ -397,7 +411,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 4. Querying for multiple data sets\n",
+    "### 4. How to query for multiple data sets\n",
     "\n",
     "In the case above, both the `calexp` and `src` can be found by the registry, but this will not always necessarily be the case. The `queryDimensions` method provides a more flexible way to query for multiple datasets (requiring an instance of all datasets to be available for that dataId) or ask for different dataId keys than what is used to identify the dataset (which invokes various built-in relationships). An example of this is provided below:"
    ]
@@ -453,7 +467,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 5. Grabbing an Image Cutout\n",
+    "### 5. Generate an Image Cutout\n",
     "\n",
     "Say we want to grab a cutout of the DP0.1 coadded images at a specific location. In order to do this, we need a few other packages from the LSST Science Pipelines. In particular, access to the geometry and coordinate packages."
    ]
@@ -546,6 +560,139 @@
     "afw_display.mtv(cutout_image.image)\n",
     "plt.gca().axis('off')"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6. Retrieve and plot catalog data from the Butler\n",
+    "\n",
+    "The TAP service is the recommended way to retrieve DP0.1 catalog data for a notebook, and there are several other tutuorials that demonstrate how to use the TAP service.\n",
+    "\n",
+    "But if Butler access to catalog data is needed, start by retrieving only the schema of a Butler catalog.\n",
+    "Unlike with TAP schema (as in the first tutorial notebook), the Butler schema do not come with embedded column descriptions. Refer to the DP0.1 Data Products Definitions Document (DPDD) at [dp0-1.lsst.io](http://dp0-1.lsst.io) to find out more about the columns."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "schema_coadd_src = butler.get('deepCoadd_forced_src_schema')\n",
+    "schema_coadd_src.asAstropy()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<br>\n",
+    "The catalogs are very large and it is not feasible to try and retrieve it in its entirety.\n",
+    "Instead, in this example we identify the tract and patch of interest and retrieve only catalog data for a small region of sky.\n",
+    "Use the same ra and dec coordinates as above to find the patch and tract."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "radec = geom.SpherePoint(ra, dec, geom.degrees)\n",
+    "\n",
+    "skymap = butler.get(\"skyMap\")\n",
+    "\n",
+    "tractInfo = skymap.findTract(radec)\n",
+    "tract = tractInfo.getId()\n",
+    "\n",
+    "patchInfo = tractInfo.findPatch(radec)\n",
+    "patch = tractInfo.getSequentialPatchIndex(patchInfo)\n",
+    "\n",
+    "print(tract, patch)\n",
+    "\n",
+    "coaddId = {'tract': tract, 'patch': patch, 'band':'i'}\n",
+    "\n",
+    "coadd_src = butler.get('deepCoadd_forced_src',coaddId)\n",
+    "coadd_src = src.copy(True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Show the table contents if desired\n",
+    "# coadd_src.asAstropy()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<br>\n",
+    "\n",
+    "Convert to a Pandas dataframe (see the first tutorial) for easy interaction.\n",
+    "The following cells offer options for printing the column names or the data values."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data = coadd_src.asAstropy().to_pandas()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# print(data.columns)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# for col in data.columns:\n",
+    "#     print(col)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# data['coord_ra'].values"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Plot the locations of sources in the patch."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig = plt.figure()\n",
+    "plt.plot( data['coord_ra'].values, data['coord_dec'].values, 'o', ms=2, alpha=0.5 )\n",
+    "plt.xlabel('RA')\n",
+    "plt.ylabel('Dec')\n",
+    "plt.title('Butler coadd_forced_src objects in tract 4638 patch 43')"
+   ]
   }
  ],
  "metadata": {