目次
概要
Google Cloud Skills Boostの[Insights from Data with BigQuery]クエストのチャレンジラボの簡単な手順の解説です。
このチャレンジラボは、COVID19に関するオープンデータセットを利用します。BigQueryでCOVID19オープンデータセットのテーブルにSQLを実行して、COVID19についての統計分析を行います。
クエリー内のマーカーでハイライトしている部分は、ラボによって変わる変数です。ラボのときには、置き換えてください。
Task 1. Total confirmed cases
指定した日付(date)の全世界のトータル感染者数を返すクエリーを実行します。
SELECT sum(cumulative_confirmed) as total_cases_worldwide FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE date='date'
Task 2. Worst affected areas
指定した日付(date)のアメリカ国内で死者数(DeathCount)以上の州がいくつあるかを調べるクエリーを実行します。
with deaths_by_states as ( SELECT subregion1_name as state, sum(cumulative_deceased) as death_count FROM `bigquery-public-data.covid19_open_data.covid19_open_data` where country_name="United States of America" and date='date' and subregion1_name is NOT NULL group by subregion1_name ) select count(*) as count_of_states from deaths_by_states where death_count > DeathCount
Task 3. Identifying hotspots
指定した日付(date)のアメリカ国内で感染者数(ConfirmedCases)以上の州をリストアップするクエリーを実行します。
SELECT * FROM ( SELECT subregion1_name as state, sum(cumulative_confirmed) as total_confirmed_cases FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_code="US" AND date='date' AND subregion1_name is NOT NULL GROUP BY subregion1_name ORDER BY total_confirmed_cases DESC ) WHERE total_confirmed_cases > ConfirmedCases
Task 4. Fatality ratio
イタリアの2020年の指定月(Month)における致死率を求めるクエリーを実行します。
SELECT sum(cumulative_confirmed) as total_confirmed_cases, sum(cumulative_deceased) as total_deaths, (sum(cumulative_deceased)/sum(cumulative_confirmed))*100 as case_fatality_ratio FROM `bigquery-public-data.covid19_open_data.covid19_open_data` where country_name="Italy" AND date BETWEEN 'Month_start_day'and 'Month_end_day'
Task 5. Identifying specific day
イタリアの死者数が指定数(DeathCount)を超えた日を求めるクエリーを実行します。
SELECT date FROM `bigquery-public-data.covid19_open_data.covid19_open_data` where country_name="Italy" and cumulative_deceased > DeathCount order by date asc limit 1
Task 6. Finding days with zero net new cases
指定した日付の間(StartDate/CloseDate)において、インドで感染者数がゼロの日を求めるために以下のクエリーを考えていますが、正しく実行できません。
WITH india_cases_by_date AS ( SELECT date, SUM(cumulative_confirmed) AS cases FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_name="India" AND date between 'StartDate' and 'CloseDate' GROUP BY date ORDER BY date ASC ) , india_previous_day_comparison AS (SELECT date, cases, LAG(cases) OVER(ORDER BY date) AS previous_day, cases - LAG(cases) OVER(ORDER BY date) AS net_new_cases FROM india_cases_by_date )
正しいクエリーとして実行します。
WITH india_cases_by_date AS ( SELECT date, SUM(cumulative_confirmed) AS cases FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_name="India" AND date between 'StartDate' and 'CloseDate' GROUP BY date ORDER BY date ASC ) , india_previous_day_comparison AS (SELECT date, cases, LAG(cases) OVER(ORDER BY date) AS previous_day, cases - LAG(cases) OVER(ORDER BY date) AS net_new_cases FROM india_cases_by_date ) select count(*) from india_previous_day_comparison where net_new_cases=0
Task 7. Doubling rate
Task 6のクエリーをテンプレートとして利用します。アメリカ国内で2020/3/22から2020/4/20までの間、前日よりも(LimitValue)%以上感染者数が増加した日を求めるためのクエリーを実行します。
WITH us_cases_by_date AS ( SELECT date, SUM(cumulative_confirmed) AS cases FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_name="United States of America" AND date between '2020-03-22' and '2020-04-20' GROUP BY date ORDER BY date ASC ) , us_previous_day_comparison AS (SELECT date, cases, LAG(cases) OVER(ORDER BY date) AS previous_day, cases - LAG(cases) OVER(ORDER BY date) AS net_new_cases, (cases - LAG(cases) OVER(ORDER BY date))*100/LAG(cases) OVER(ORDER BY date) AS percentage_increase FROM us_cases_by_date ) select Date, cases as Confirmed_Cases_On_Day, previous_day as Confirmed_Cases_Previous_Day, percentage_increase as Percentage_Increase_In_Cases from us_previous_day_comparison where percentage_increase > LimitValue
Task 8. Recovery rate
2020/5/10までで回復率が上位の国から(LimitValue)カ国リストするクエリーを実行します。感染者数が50K以上の国が対象です。
WITH cases_by_country AS ( SELECT country_name AS country, sum(cumulative_confirmed) AS cases, sum(cumulative_recovered) AS recovered_cases FROM bigquery-public-data.covid19_open_data.covid19_open_data WHERE date = '2020-05-10' GROUP BY country_name ) , recovered_rate AS (SELECT country, cases, recovered_cases, (recovered_cases * 100)/cases AS recovery_rate FROM cases_by_country ) SELECT country, cases AS confirmed_cases, recovered_cases, recovery_rate FROM recovered_rate WHERE cases > 50000 ORDER BY recovery_rate desc LIMIT LimitValue
Task 9. CDGR – Cumulative daily growth rate
フランスにおける指定日付(Date)のCDGR(Cumulative Daily Growth Rate)を求めるクエリーを実行します。以下のクエリーは正しく実行できません。
WITH france_cases AS ( SELECT date, SUM(cumulative_confirmed) AS total_cases FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_name="France" AND date IN ('2020-01-24', '2020-05-10') GROUP BY date ORDER BY date) , summary as ( SELECT total_cases AS first_day_cases, LEAD(total_cases) AS last_day_cases, DATE_DIFF(LEAD(date) OVER(ORDER BY date),date, day) AS days_diff FROM france_cases LIMIT 1 ) select first_day_cases, last_day_cases, days_diff, SQRT((last_day_cases/first_day_cases),(1/days_diff))-1 as cdgr from summary
クエリーを修正して実行します。
WITH france_cases AS ( SELECT date, SUM(cumulative_confirmed) AS total_cases FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_name="France" AND date IN ('2020-01-24', 'Date') GROUP BY date ORDER BY date) , summary as ( SELECT total_cases AS first_day_cases, LEAD(total_cases) OVER(ORDER BY date) AS last_day_cases, DATE_DIFF(LEAD(date) OVER(ORDER BY date),date, day) AS days_diff FROM france_cases LIMIT 1 ) select first_day_cases, last_day_cases, days_diff, POWER((last_day_cases/first_day_cases),(1/days_diff))-1 as cdgr from summary
Task 10. Create a Looker Studio report
アメリカ国内で以下のデータをプロットするチャートをLooker Studioで作成します。
- Number of Confirmed Cases
- Number of Deaths
- Date range : DataRange
SELECT date, SUM(cumulative_confirmed) AS country_cases, SUM(cumulative_deceased) AS country_deaths FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE date BETWEEN 'DataRangeStart' AND 'DataRangeClose' AND country_name ="United States of America" GROUP BY date
クエリー結果をLooker Studioにエクスポートして、チャートを作成します。
Google Cloud Platform
- [GoogleCloudSkillsBoost] Insights from Data with BigQuery: Challenge Lab
- Google Cloud Platform(GCP)の利用登録手順
- GCPのユーザインタフェース ~Cloud Console/Cloud Shell~
- GCPプロジェクトとは ~GCPリソースを管理する単位
- リージョンとゾーン ~どこにリソースを作る?~
- Google Compute Engine(GCE) 仮想マシンの作成例
- Google Cloud Storage(GCS)を使ってみよう
- GCPでWordPress Webサイトの構築
- [Qwiklabs] Serverless Firebase Development Challenge Lab