From ff15b9406c9abd0ffde9db3b63524bdadbfece8f Mon Sep 17 00:00:00 2001 From: star1327p Date: Sat, 14 Mar 2026 00:30:29 -0700 Subject: [PATCH] a HTML -> an HTML --- book/_freeze/chapters/quiz_01/execute-results/html.json | 2 +- book/chapters/quiz_01.qmd | 2 +- book/content/chapters/quiz_01/execute-results/html.json | 2 +- slides/_freeze/chapters/quiz_01/execute-results/html.json | 2 +- slides/chapters/01_exploring_data.qmd | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/book/_freeze/chapters/quiz_01/execute-results/html.json b/book/_freeze/chapters/quiz_01/execute-results/html.json index 51ebe6e..5c56fdc 100644 --- a/book/_freeze/chapters/quiz_01/execute-results/html.json +++ b/book/_freeze/chapters/quiz_01/execute-results/html.json @@ -2,7 +2,7 @@ "hash": "47c1e113fd8c600096d296c35408d88d", "result": { "engine": "jupyter", - "markdown": "---\ntitle: \"Quiz: Exploring and sanitizing dataframes with skrub\"\nformat:\n html:\n code-tools: true\n---\n\n## Question 1\n::: {.callout}\n\nWhat do I need to open a `TableReport` saved with `.write_html(\"report.html\")`?\n\n- [ ] A) A python console\n- [ ] B) An internet browser\n- [ ] C) A Jupyter notebook\n:::\n\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B) \n\nAfter its generation, the `TableReport` can be persisted on disk as a HTML file.\nThe file can be opened using a regular internet browswer.\n\nThe `TableReport` is not updated dynamically, and is not connected to python consoles\nor running kernels.\n:::\n\n\n## Question 2\n::: {.callout}\n\nConsider this dataframe and TableReport, then answer the question. \n\n::: {#70ba1a94 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom skrub import TableReport\n\ndf = pd.DataFrame({\n 'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],\n 'Age': [25, 30, 35, 40, 45],\n 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],\n 'Salary': [70000, 80000, 90000, 100000, 110000],\n 'Department': ['HR', 'Finance', 'IT', 'Marketing', 'Sales']\n})\n\nTableReport(df, max_plot_columns=5, max_association_columns=3)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n``````````````````````````{=html}\n\n\n\n
\n

Please enable javascript

\n

\n The skrub table reports need javascript to display correctly. If you are\n displaying a report in a Jupyter notebook and you see this message, you may need to\n re-execute the cell or to trust the notebook (button on the top right or\n \"File > Trust notebook\").\n

\n
\n\n\n``````````````````````````\n:::\n:::\n\n\nWhat does the \"Distributions\" tab show? What about the \"Associations\" tab?\n\n- [ ] A) Both tabs work as normal. \n- [ ] B) The \"Distribution\" tab shows the plots, \"Associations\" are not shown.\n- [ ] C) Both tabs contain a message explaining their operation was skipped. \n\n\n:::\n\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B)\n\nThe \"Distribution\" contains the usual distribution plots, while the computation\nof the associations was skipped because the number of columns in the dataframe (5)\nwas larger than `max_association_columns` (3). \n:::\n\n\n## Question 3 \n\n::: {.callout}\nDoes the `TableReport` parse datetimes or other data types? \n\n- [ ] Yes, the `TableReport` automatically converts datetime strings to datetime \nobjects and strings that contain numbers into floats. \n- [ ] No, the `TableReport` does not perform any conversion. \n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\n\nAnswer: No, the `TableReport` is generated on the basis of the datatypes found \nin the supplied dataframe. Any datatype parsing must be done before generating the\nreport, e.g., by using the `Cleaner`. \n:::\n\n## Question 4\n\n::: {.callout}\nWhich of these transformations is executed **by default** when the `Cleaner` is \nfitted on a dataframe? \n\n- [ ] A) Dropping constant columns\n- [ ] B) Dropping columns that contain only missing values\n- [ ] C) Dropping columns that contain more than 90% of missing values\n- [ ] D) Dropping columns where all values are distinct\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B) \n\nColumns that contain only missing values, i.e., where the fraction of missing \nvalues is 1.0, are dropped. This is controlled by the `drop_null_fraction` parameter. \n\n:::\n\n\n## Question 5 \n\n::: {.callout}\nConsider the following dataframe. \n\n::: {#2755ff47 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nmedical_df = pd.DataFrame({\n 'Patient_ID': ['P001', 'P002', 'P003', 'P004', 'P005'],\n 'Visit_Date': ['10 Jan 2023', '15 Feb 2023', '20 Mar 2023', '25 Apr 2023', None],\n 'Blood_Pressure': [120.5, 130.2, 125.8, 140.0, 135.6],\n 'Diagnosis': ['Hypertension', '?', '?', 'Hypertension', 'Diabetes'],\n})\n\nmedical_df\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023130.2?
2P00320 Mar 2023125.8?
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\nWhat is the output of this cleaner? \n\n::: {#8dc6a71d .cell execution_count=3}\n``` {.python .cell-code}\nfrom skrub import Cleaner\ncleaner = Cleaner()\ndf_clean = cleaner.fit_transform(medical_df)\n```\n:::\n\n\n- [ ] A)\n\n::: {#20c00bff .cell execution_count=4}\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P0012023-01-10120.5Hypertension
1P0022023-02-15130.2None
2P0032023-03-20125.8None
3P0042023-04-25140.0Hypertension
4P005NaT135.6Diabetes
\n
\n```\n:::\n:::\n\n\n- [ ] B)\n\n::: {#f506b6f6 .cell execution_count=5}\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023NaN?
2P00320 Mar 2023125.8?
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\n- [ ] C)\n\n::: {#a3fd3416 .cell execution_count=6}\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023NaNNone
2P00320 Mar 2023125.8None
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: A)\n\nThe `Cleaner` replaces strings that are commonly used to denote missing values \n(such as \"?\"), and guesses most common datetime formats from their strings. \n\nNo empty columns are present, so no further transformations are made. \n:::\n\n", + "markdown": "---\ntitle: \"Quiz: Exploring and sanitizing dataframes with skrub\"\nformat:\n html:\n code-tools: true\n---\n\n## Question 1\n::: {.callout}\n\nWhat do I need to open a `TableReport` saved with `.write_html(\"report.html\")`?\n\n- [ ] A) A python console\n- [ ] B) An internet browser\n- [ ] C) A Jupyter notebook\n:::\n\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B) \n\nAfter its generation, the `TableReport` can be persisted on disk as an HTML file.\nThe file can be opened using a regular internet browswer.\n\nThe `TableReport` is not updated dynamically, and is not connected to python consoles\nor running kernels.\n:::\n\n\n## Question 2\n::: {.callout}\n\nConsider this dataframe and TableReport, then answer the question. \n\n::: {#70ba1a94 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom skrub import TableReport\n\ndf = pd.DataFrame({\n 'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],\n 'Age': [25, 30, 35, 40, 45],\n 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],\n 'Salary': [70000, 80000, 90000, 100000, 110000],\n 'Department': ['HR', 'Finance', 'IT', 'Marketing', 'Sales']\n})\n\nTableReport(df, max_plot_columns=5, max_association_columns=3)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n``````````````````````````{=html}\n\n\n\n
\n

Please enable javascript

\n

\n The skrub table reports need javascript to display correctly. If you are\n displaying a report in a Jupyter notebook and you see this message, you may need to\n re-execute the cell or to trust the notebook (button on the top right or\n \"File > Trust notebook\").\n

\n
\n\n\n``````````````````````````\n:::\n:::\n\n\nWhat does the \"Distributions\" tab show? What about the \"Associations\" tab?\n\n- [ ] A) Both tabs work as normal. \n- [ ] B) The \"Distribution\" tab shows the plots, \"Associations\" are not shown.\n- [ ] C) Both tabs contain a message explaining their operation was skipped. \n\n\n:::\n\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B)\n\nThe \"Distribution\" contains the usual distribution plots, while the computation\nof the associations was skipped because the number of columns in the dataframe (5)\nwas larger than `max_association_columns` (3). \n:::\n\n\n## Question 3 \n\n::: {.callout}\nDoes the `TableReport` parse datetimes or other data types? \n\n- [ ] Yes, the `TableReport` automatically converts datetime strings to datetime \nobjects and strings that contain numbers into floats. \n- [ ] No, the `TableReport` does not perform any conversion. \n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\n\nAnswer: No, the `TableReport` is generated on the basis of the datatypes found \nin the supplied dataframe. Any datatype parsing must be done before generating the\nreport, e.g., by using the `Cleaner`. \n:::\n\n## Question 4\n\n::: {.callout}\nWhich of these transformations is executed **by default** when the `Cleaner` is \nfitted on a dataframe? \n\n- [ ] A) Dropping constant columns\n- [ ] B) Dropping columns that contain only missing values\n- [ ] C) Dropping columns that contain more than 90% of missing values\n- [ ] D) Dropping columns where all values are distinct\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B) \n\nColumns that contain only missing values, i.e., where the fraction of missing \nvalues is 1.0, are dropped. This is controlled by the `drop_null_fraction` parameter. \n\n:::\n\n\n## Question 5 \n\n::: {.callout}\nConsider the following dataframe. \n\n::: {#2755ff47 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nmedical_df = pd.DataFrame({\n 'Patient_ID': ['P001', 'P002', 'P003', 'P004', 'P005'],\n 'Visit_Date': ['10 Jan 2023', '15 Feb 2023', '20 Mar 2023', '25 Apr 2023', None],\n 'Blood_Pressure': [120.5, 130.2, 125.8, 140.0, 135.6],\n 'Diagnosis': ['Hypertension', '?', '?', 'Hypertension', 'Diabetes'],\n})\n\nmedical_df\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023130.2?
2P00320 Mar 2023125.8?
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\nWhat is the output of this cleaner? \n\n::: {#8dc6a71d .cell execution_count=3}\n``` {.python .cell-code}\nfrom skrub import Cleaner\ncleaner = Cleaner()\ndf_clean = cleaner.fit_transform(medical_df)\n```\n:::\n\n\n- [ ] A)\n\n::: {#20c00bff .cell execution_count=4}\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P0012023-01-10120.5Hypertension
1P0022023-02-15130.2None
2P0032023-03-20125.8None
3P0042023-04-25140.0Hypertension
4P005NaT135.6Diabetes
\n
\n```\n:::\n:::\n\n\n- [ ] B)\n\n::: {#f506b6f6 .cell execution_count=5}\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023NaN?
2P00320 Mar 2023125.8?
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\n- [ ] C)\n\n::: {#a3fd3416 .cell execution_count=6}\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023NaNNone
2P00320 Mar 2023125.8None
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: A)\n\nThe `Cleaner` replaces strings that are commonly used to denote missing values \n(such as \"?\"), and guesses most common datetime formats from their strings. \n\nNo empty columns are present, so no further transformations are made. \n:::\n\n", "supporting": [ "quiz_01_files" ], diff --git a/book/chapters/quiz_01.qmd b/book/chapters/quiz_01.qmd index 09da7ab..1b73b78 100644 --- a/book/chapters/quiz_01.qmd +++ b/book/chapters/quiz_01.qmd @@ -20,7 +20,7 @@ What do I need to open a `TableReport` saved with `.write_html("report.html")`? ### Solution Answer: B) -After its generation, the `TableReport` can be persisted on disk as a HTML file. +After its generation, the `TableReport` can be persisted on disk as an HTML file. The file can be opened using a regular internet browswer. The `TableReport` is not updated dynamically, and is not connected to python consoles diff --git a/book/content/chapters/quiz_01/execute-results/html.json b/book/content/chapters/quiz_01/execute-results/html.json index 54c73e3..b7284d3 100644 --- a/book/content/chapters/quiz_01/execute-results/html.json +++ b/book/content/chapters/quiz_01/execute-results/html.json @@ -2,7 +2,7 @@ "hash": "47c1e113fd8c600096d296c35408d88d", "result": { "engine": "jupyter", - "markdown": "---\ntitle: \"Quiz: Exploring and sanitizing dataframes with skrub\"\nformat:\n html:\n code-tools: true\n---\n\n## Question 1\n::: {.callout}\n\nWhat do I need to open a `TableReport` saved with `.write_html(\"report.html\")`?\n\n- [ ] A) A python console\n- [ ] B) An internet browser\n- [ ] C) A Jupyter notebook\n:::\n\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B) \n\nAfter its generation, the `TableReport` can be persisted on disk as a HTML file.\nThe file can be opened using a regular internet browswer.\n\nThe `TableReport` is not updated dynamically, and is not connected to python consoles\nor running kernels.\n:::\n\n\n## Question 2\n::: {.callout}\n\nConsider this dataframe and TableReport, then answer the question. \n\n::: {#944b3e5f .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom skrub import TableReport\n\ndf = pd.DataFrame({\n 'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],\n 'Age': [25, 30, 35, 40, 45],\n 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],\n 'Salary': [70000, 80000, 90000, 100000, 110000],\n 'Department': ['HR', 'Finance', 'IT', 'Marketing', 'Sales']\n})\n\nTableReport(df, max_plot_columns=5, max_association_columns=3)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nProcessing column 1 / 5\rProcessing column 2 / 5\rProcessing column 3 / 5\rProcessing column 4 / 5\rProcessing column 5 / 5\r\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=1}\n``````````````````````````{=html}\n\n\n\n
\n

Please enable javascript

\n

\n The skrub table reports need javascript to display correctly. If you are\n displaying a report in a Jupyter notebook and you see this message, you may need to\n re-execute the cell or to trust the notebook (button on the top right or\n \"File > Trust notebook\").\n

\n
\n\n\n``````````````````````````\n:::\n:::\n\n\nWhat does the \"Distributions\" tab show? What about the \"Associations\" tab?\n\n- [ ] A) Both tabs work as normal. \n- [ ] B) The \"Distribution\" tab shows the plots, \"Associations\" are not shown.\n- [ ] C) Both tabs contain a message explaining their operation was skipped. \n\n\n:::\n\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B)\n\nThe \"Distribution\" contains the usual distribution plots, while the computation\nof the associations was skipped because the number of columns in the dataframe (5)\nwas larger than `max_association_columns` (3). \n:::\n\n\n## Question 3 \n\n::: {.callout}\nDoes the `TableReport` parse datetimes or other data types? \n\n- [ ] Yes, the `TableReport` automatically converts datetime strings to datetime \nobjects and strings that contain numbers into floats. \n- [ ] No, the `TableReport` does not perform any conversion. \n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\n\nAnswer: No, the `TableReport` is generated on the basis of the datatypes found \nin the supplied dataframe. Any datatype parsing must be done before generating the\nreport, e.g., by using the `Cleaner`. \n:::\n\n## Question 4\n\n::: {.callout}\nWhich of these transformations is executed **by default** when the `Cleaner` is \nfitted on a dataframe? \n\n- [ ] A) Dropping constant columns\n- [ ] B) Dropping columns that contain only missing values\n- [ ] C) Dropping columns that contain more than 90% of missing values\n- [ ] D) Dropping columns where all values are distinct\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B) \n\nColumns that contain only missing values, i.e., where the fraction of missing \nvalues is 1.0, are dropped. This is controlled by the `drop_null_fraction` parameter. \n\n:::\n\n\n## Question 5 \n\n::: {.callout}\nConsider the following dataframe. \n\n::: {#1ed93c96 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nmedical_df = pd.DataFrame({\n 'Patient_ID': ['P001', 'P002', 'P003', 'P004', 'P005'],\n 'Visit_Date': ['10 Jan 2023', '15 Feb 2023', '20 Mar 2023', '25 Apr 2023', None],\n 'Blood_Pressure': [120.5, 130.2, 125.8, 140.0, 135.6],\n 'Diagnosis': ['Hypertension', '?', '?', 'Hypertension', 'Diabetes'],\n})\n\nmedical_df\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023130.2?
2P00320 Mar 2023125.8?
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\nWhat is the output of this cleaner? \n\n::: {#2fe86867 .cell execution_count=3}\n``` {.python .cell-code}\nfrom skrub import Cleaner\ncleaner = Cleaner()\ndf_clean = cleaner.fit_transform(medical_df)\n```\n:::\n\n\n- [ ] A)\n\n::: {#b8eb271c .cell execution_count=4}\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P0012023-01-10120.5Hypertension
1P0022023-02-15130.2None
2P0032023-03-20125.8None
3P0042023-04-25140.0Hypertension
4P005NaT135.6Diabetes
\n
\n```\n:::\n:::\n\n\n- [ ] B)\n\n::: {#516010e4 .cell execution_count=5}\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023NaN?
2P00320 Mar 2023125.8?
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\n- [ ] C)\n\n::: {#7e9b9434 .cell execution_count=6}\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023NaNNone
2P00320 Mar 2023125.8None
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: A)\n\nThe `Cleaner` replaces strings that are commonly used to denote missing values \n(such as \"?\"), and guesses most common datetime formats from their strings. \n\nNo empty columns are present, so no further transformations are made. \n:::\n\n", + "markdown": "---\ntitle: \"Quiz: Exploring and sanitizing dataframes with skrub\"\nformat:\n html:\n code-tools: true\n---\n\n## Question 1\n::: {.callout}\n\nWhat do I need to open a `TableReport` saved with `.write_html(\"report.html\")`?\n\n- [ ] A) A python console\n- [ ] B) An internet browser\n- [ ] C) A Jupyter notebook\n:::\n\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B) \n\nAfter its generation, the `TableReport` can be persisted on disk as an HTML file.\nThe file can be opened using a regular internet browswer.\n\nThe `TableReport` is not updated dynamically, and is not connected to python consoles\nor running kernels.\n:::\n\n\n## Question 2\n::: {.callout}\n\nConsider this dataframe and TableReport, then answer the question. \n\n::: {#944b3e5f .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom skrub import TableReport\n\ndf = pd.DataFrame({\n 'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],\n 'Age': [25, 30, 35, 40, 45],\n 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],\n 'Salary': [70000, 80000, 90000, 100000, 110000],\n 'Department': ['HR', 'Finance', 'IT', 'Marketing', 'Sales']\n})\n\nTableReport(df, max_plot_columns=5, max_association_columns=3)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nProcessing column 1 / 5\rProcessing column 2 / 5\rProcessing column 3 / 5\rProcessing column 4 / 5\rProcessing column 5 / 5\r\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=1}\n``````````````````````````{=html}\n\n\n\n
\n

Please enable javascript

\n

\n The skrub table reports need javascript to display correctly. If you are\n displaying a report in a Jupyter notebook and you see this message, you may need to\n re-execute the cell or to trust the notebook (button on the top right or\n \"File > Trust notebook\").\n

\n
\n\n\n``````````````````````````\n:::\n:::\n\n\nWhat does the \"Distributions\" tab show? What about the \"Associations\" tab?\n\n- [ ] A) Both tabs work as normal. \n- [ ] B) The \"Distribution\" tab shows the plots, \"Associations\" are not shown.\n- [ ] C) Both tabs contain a message explaining their operation was skipped. \n\n\n:::\n\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B)\n\nThe \"Distribution\" contains the usual distribution plots, while the computation\nof the associations was skipped because the number of columns in the dataframe (5)\nwas larger than `max_association_columns` (3). \n:::\n\n\n## Question 3 \n\n::: {.callout}\nDoes the `TableReport` parse datetimes or other data types? \n\n- [ ] Yes, the `TableReport` automatically converts datetime strings to datetime \nobjects and strings that contain numbers into floats. \n- [ ] No, the `TableReport` does not perform any conversion. \n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\n\nAnswer: No, the `TableReport` is generated on the basis of the datatypes found \nin the supplied dataframe. Any datatype parsing must be done before generating the\nreport, e.g., by using the `Cleaner`. \n:::\n\n## Question 4\n\n::: {.callout}\nWhich of these transformations is executed **by default** when the `Cleaner` is \nfitted on a dataframe? \n\n- [ ] A) Dropping constant columns\n- [ ] B) Dropping columns that contain only missing values\n- [ ] C) Dropping columns that contain more than 90% of missing values\n- [ ] D) Dropping columns where all values are distinct\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B) \n\nColumns that contain only missing values, i.e., where the fraction of missing \nvalues is 1.0, are dropped. This is controlled by the `drop_null_fraction` parameter. \n\n:::\n\n\n## Question 5 \n\n::: {.callout}\nConsider the following dataframe. \n\n::: {#1ed93c96 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nmedical_df = pd.DataFrame({\n 'Patient_ID': ['P001', 'P002', 'P003', 'P004', 'P005'],\n 'Visit_Date': ['10 Jan 2023', '15 Feb 2023', '20 Mar 2023', '25 Apr 2023', None],\n 'Blood_Pressure': [120.5, 130.2, 125.8, 140.0, 135.6],\n 'Diagnosis': ['Hypertension', '?', '?', 'Hypertension', 'Diabetes'],\n})\n\nmedical_df\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023130.2?
2P00320 Mar 2023125.8?
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\nWhat is the output of this cleaner? \n\n::: {#2fe86867 .cell execution_count=3}\n``` {.python .cell-code}\nfrom skrub import Cleaner\ncleaner = Cleaner()\ndf_clean = cleaner.fit_transform(medical_df)\n```\n:::\n\n\n- [ ] A)\n\n::: {#b8eb271c .cell execution_count=4}\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P0012023-01-10120.5Hypertension
1P0022023-02-15130.2None
2P0032023-03-20125.8None
3P0042023-04-25140.0Hypertension
4P005NaT135.6Diabetes
\n
\n```\n:::\n:::\n\n\n- [ ] B)\n\n::: {#516010e4 .cell execution_count=5}\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023NaN?
2P00320 Mar 2023125.8?
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\n- [ ] C)\n\n::: {#7e9b9434 .cell execution_count=6}\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023NaNNone
2P00320 Mar 2023125.8None
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: A)\n\nThe `Cleaner` replaces strings that are commonly used to denote missing values \n(such as \"?\"), and guesses most common datetime formats from their strings. \n\nNo empty columns are present, so no further transformations are made. \n:::\n\n", "supporting": [ "quiz_01_files" ], diff --git a/slides/_freeze/chapters/quiz_01/execute-results/html.json b/slides/_freeze/chapters/quiz_01/execute-results/html.json index d88a88e..82c22a8 100644 --- a/slides/_freeze/chapters/quiz_01/execute-results/html.json +++ b/slides/_freeze/chapters/quiz_01/execute-results/html.json @@ -2,7 +2,7 @@ "hash": "47c1e113fd8c600096d296c35408d88d", "result": { "engine": "jupyter", - "markdown": "---\ntitle: \"Quiz: Exploring and sanitizing dataframes with skrub\"\nformat:\n html:\n code-tools: true\n---\n\n## Question 1\n::: {.callout}\n\nWhat do I need to open a `TableReport` saved with `.write_html(\"report.html\")`?\n\n- [ ] A) A python console\n- [ ] B) An internet browser\n- [ ] C) A Jupyter notebook\n:::\n\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B) \n\nAfter its generation, the `TableReport` can be persisted on disk as a HTML file.\nThe file can be opened using a regular internet browswer.\n\nThe `TableReport` is not updated dynamically, and is not connected to python consoles\nor running kernels.\n:::\n\n\n## Question 2\n::: {.callout}\n\nConsider this dataframe and TableReport, then answer the question. \n\n::: {#93bdbd34 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom skrub import TableReport\n\ndf = pd.DataFrame({\n 'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],\n 'Age': [25, 30, 35, 40, 45],\n 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],\n 'Salary': [70000, 80000, 90000, 100000, 110000],\n 'Department': ['HR', 'Finance', 'IT', 'Marketing', 'Sales']\n})\n\nTableReport(df, max_plot_columns=5, max_association_columns=3)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nProcessing column 1 / 5\rProcessing column 2 / 5\rProcessing column 3 / 5\rProcessing column 4 / 5\rProcessing column 5 / 5\r\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=1}\n``````````````````````````{=html}\n\n\n\n
\n

Please enable javascript

\n

\n The skrub table reports need javascript to display correctly. If you are\n displaying a report in a Jupyter notebook and you see this message, you may need to\n re-execute the cell or to trust the notebook (button on the top right or\n \"File > Trust notebook\").\n

\n
\n\n\n``````````````````````````\n:::\n:::\n\n\nWhat does the \"Distributions\" tab show? What about the \"Associations\" tab?\n\n- [ ] A) Both tabs work as normal. \n- [ ] B) The \"Distribution\" tab shows the plots, \"Associations\" are not shown.\n- [ ] C) Both tabs contain a message explaining their operation was skipped. \n\n\n:::\n\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B)\n\nThe \"Distribution\" contains the usual distribution plots, while the computation\nof the associations was skipped because the number of columns in the dataframe (5)\nwas larger than `max_association_columns` (3). \n:::\n\n\n## Question 3 \n\n::: {.callout}\nDoes the `TableReport` parse datetimes or other data types? \n\n- [ ] Yes, the `TableReport` automatically converts datetime strings to datetime \nobjects and strings that contain numbers into floats. \n- [ ] No, the `TableReport` does not perform any conversion. \n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\n\nAnswer: No, the `TableReport` is generated on the basis of the datatypes found \nin the supplied dataframe. Any datatype parsing must be done before generating the\nreport, e.g., by using the `Cleaner`. \n:::\n\n## Question 4\n\n::: {.callout}\nWhich of these transformations is executed **by default** when the `Cleaner` is \nfitted on a dataframe? \n\n- [ ] A) Dropping constant columns\n- [ ] B) Dropping columns that contain only missing values\n- [ ] C) Dropping columns that contain more than 90% of missing values\n- [ ] D) Dropping columns where all values are distinct\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B) \n\nColumns that contain only missing values, i.e., where the fraction of missing \nvalues is 1.0, are dropped. This is controlled by the `drop_null_fraction` parameter. \n\n:::\n\n\n## Question 5 \n\n::: {.callout}\nConsider the following dataframe. \n\n::: {#9cb4d153 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nmedical_df = pd.DataFrame({\n 'Patient_ID': ['P001', 'P002', 'P003', 'P004', 'P005'],\n 'Visit_Date': ['10 Jan 2023', '15 Feb 2023', '20 Mar 2023', '25 Apr 2023', None],\n 'Blood_Pressure': [120.5, 130.2, 125.8, 140.0, 135.6],\n 'Diagnosis': ['Hypertension', '?', '?', 'Hypertension', 'Diabetes'],\n})\n\nmedical_df\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023130.2?
2P00320 Mar 2023125.8?
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\nWhat is the output of this cleaner? \n\n::: {#394b9e95 .cell execution_count=3}\n``` {.python .cell-code}\nfrom skrub import Cleaner\ncleaner = Cleaner()\ndf_clean = cleaner.fit_transform(medical_df)\n```\n:::\n\n\n- [ ] A)\n\n::: {#b5e70441 .cell execution_count=4}\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P0012023-01-10120.5Hypertension
1P0022023-02-15130.2None
2P0032023-03-20125.8None
3P0042023-04-25140.0Hypertension
4P005NaT135.6Diabetes
\n
\n```\n:::\n:::\n\n\n- [ ] B)\n\n::: {#a7d973b0 .cell execution_count=5}\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023NaN?
2P00320 Mar 2023125.8?
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\n- [ ] C)\n\n::: {#a9e96208 .cell execution_count=6}\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023NaNNone
2P00320 Mar 2023125.8None
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: A)\n\nThe `Cleaner` replaces strings that are commonly used to denote missing values \n(such as \"?\"), and guesses most common datetime formats from their strings. \n\nNo empty columns are present, so no further transformations are made. \n:::\n\n", + "markdown": "---\ntitle: \"Quiz: Exploring and sanitizing dataframes with skrub\"\nformat:\n html:\n code-tools: true\n---\n\n## Question 1\n::: {.callout}\n\nWhat do I need to open a `TableReport` saved with `.write_html(\"report.html\")`?\n\n- [ ] A) A python console\n- [ ] B) An internet browser\n- [ ] C) A Jupyter notebook\n:::\n\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B) \n\nAfter its generation, the `TableReport` can be persisted on disk as an HTML file.\nThe file can be opened using a regular internet browswer.\n\nThe `TableReport` is not updated dynamically, and is not connected to python consoles\nor running kernels.\n:::\n\n\n## Question 2\n::: {.callout}\n\nConsider this dataframe and TableReport, then answer the question. \n\n::: {#93bdbd34 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom skrub import TableReport\n\ndf = pd.DataFrame({\n 'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],\n 'Age': [25, 30, 35, 40, 45],\n 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],\n 'Salary': [70000, 80000, 90000, 100000, 110000],\n 'Department': ['HR', 'Finance', 'IT', 'Marketing', 'Sales']\n})\n\nTableReport(df, max_plot_columns=5, max_association_columns=3)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nProcessing column 1 / 5\rProcessing column 2 / 5\rProcessing column 3 / 5\rProcessing column 4 / 5\rProcessing column 5 / 5\r\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=1}\n``````````````````````````{=html}\n\n\n\n
\n

Please enable javascript

\n

\n The skrub table reports need javascript to display correctly. If you are\n displaying a report in a Jupyter notebook and you see this message, you may need to\n re-execute the cell or to trust the notebook (button on the top right or\n \"File > Trust notebook\").\n

\n
\n\n\n``````````````````````````\n:::\n:::\n\n\nWhat does the \"Distributions\" tab show? What about the \"Associations\" tab?\n\n- [ ] A) Both tabs work as normal. \n- [ ] B) The \"Distribution\" tab shows the plots, \"Associations\" are not shown.\n- [ ] C) Both tabs contain a message explaining their operation was skipped. \n\n\n:::\n\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B)\n\nThe \"Distribution\" contains the usual distribution plots, while the computation\nof the associations was skipped because the number of columns in the dataframe (5)\nwas larger than `max_association_columns` (3). \n:::\n\n\n## Question 3 \n\n::: {.callout}\nDoes the `TableReport` parse datetimes or other data types? \n\n- [ ] Yes, the `TableReport` automatically converts datetime strings to datetime \nobjects and strings that contain numbers into floats. \n- [ ] No, the `TableReport` does not perform any conversion. \n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\n\nAnswer: No, the `TableReport` is generated on the basis of the datatypes found \nin the supplied dataframe. Any datatype parsing must be done before generating the\nreport, e.g., by using the `Cleaner`. \n:::\n\n## Question 4\n\n::: {.callout}\nWhich of these transformations is executed **by default** when the `Cleaner` is \nfitted on a dataframe? \n\n- [ ] A) Dropping constant columns\n- [ ] B) Dropping columns that contain only missing values\n- [ ] C) Dropping columns that contain more than 90% of missing values\n- [ ] D) Dropping columns where all values are distinct\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: B) \n\nColumns that contain only missing values, i.e., where the fraction of missing \nvalues is 1.0, are dropped. This is controlled by the `drop_null_fraction` parameter. \n\n:::\n\n\n## Question 5 \n\n::: {.callout}\nConsider the following dataframe. \n\n::: {#9cb4d153 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nmedical_df = pd.DataFrame({\n 'Patient_ID': ['P001', 'P002', 'P003', 'P004', 'P005'],\n 'Visit_Date': ['10 Jan 2023', '15 Feb 2023', '20 Mar 2023', '25 Apr 2023', None],\n 'Blood_Pressure': [120.5, 130.2, 125.8, 140.0, 135.6],\n 'Diagnosis': ['Hypertension', '?', '?', 'Hypertension', 'Diabetes'],\n})\n\nmedical_df\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023130.2?
2P00320 Mar 2023125.8?
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\nWhat is the output of this cleaner? \n\n::: {#394b9e95 .cell execution_count=3}\n``` {.python .cell-code}\nfrom skrub import Cleaner\ncleaner = Cleaner()\ndf_clean = cleaner.fit_transform(medical_df)\n```\n:::\n\n\n- [ ] A)\n\n::: {#b5e70441 .cell execution_count=4}\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P0012023-01-10120.5Hypertension
1P0022023-02-15130.2None
2P0032023-03-20125.8None
3P0042023-04-25140.0Hypertension
4P005NaT135.6Diabetes
\n
\n```\n:::\n:::\n\n\n- [ ] B)\n\n::: {#a7d973b0 .cell execution_count=5}\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023NaN?
2P00320 Mar 2023125.8?
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\n- [ ] C)\n\n::: {#a9e96208 .cell execution_count=6}\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Patient_IDVisit_DateBlood_PressureDiagnosis
0P00110 Jan 2023120.5Hypertension
1P00215 Feb 2023NaNNone
2P00320 Mar 2023125.8None
3P00425 Apr 2023140.0Hypertension
4P005None135.6Diabetes
\n
\n```\n:::\n:::\n\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n### Solution\nAnswer: A)\n\nThe `Cleaner` replaces strings that are commonly used to denote missing values \n(such as \"?\"), and guesses most common datetime formats from their strings. \n\nNo empty columns are present, so no further transformations are made. \n:::\n\n", "supporting": [ "quiz_01_files" ], diff --git a/slides/chapters/01_exploring_data.qmd b/slides/chapters/01_exploring_data.qmd index 19771b9..af9eacd 100644 --- a/slides/chapters/01_exploring_data.qmd +++ b/slides/chapters/01_exploring_data.qmd @@ -120,4 +120,4 @@ TableReport( - It provides precomputed statistics for all the columns - It prepares distribution plots for each column - It measures the association between columns -- It can be stored as a HTML file and shared without needing a running kernel \ No newline at end of file +- It can be stored as an HTML file and shared without needing a running kernel \ No newline at end of file