Use Data Lake

Populate Tables

The prior steps created a database oedi_database in AWS Glue, but it’s empty. Now, we need to run the crawlers to populate the tables in the database.

In your oedi environment, navigate back to the main directory, open-data-access-tools, and use the commands below to check and run the glue crawlers.

  1. List Crawlers

(oedi) $ oedi aws list-crawlers
All availables crawlers are:
+-----+-------------------------------------------------------------------+-------+------------------------------------------------------------------------+---------------------------+---------------------------+
| No. |                                Name                               | State |                               S3Targets                                |        LastUpdated        |         CreateTime        |
+-----+-------------------------------------------------------------------+-------+------------------------------------------------------------------------+---------------------------+---------------------------+
|  0  | nrel-pds-building-stock-comstock-athena-2020-comstock-v1-metadata | READY | s3://nrel-pds-building-stock/comstock/athena/2020/comstock_v1/metadata | 2022-01-09 23:28:24+00:00 | 2022-01-09 23:28:24+00:00 |
|  1  |   nrel-pds-building-stock-comstock-athena-2020-comstock-v1-state  | READY |  s3://nrel-pds-building-stock/comstock/athena/2020/comstock_v1/state   | 2022-01-09 23:28:23+00:00 | 2022-01-09 23:28:23+00:00 |
|  2  |            oedi-data-lake-atb-electricity-parquet-2019            | READY |           s3://oedi-data-lake/ATB/electricity/parquet/2019/            | 2022-01-09 23:28:23+00:00 | 2022-01-09 23:28:23+00:00 |
|  3  |            oedi-data-lake-atb-electricity-parquet-2020            | READY |           s3://oedi-data-lake/ATB/electricity/parquet/2020/            | 2022-01-09 23:28:23+00:00 | 2022-01-09 23:28:23+00:00 |
|  4  |            oedi-data-lake-atb-electricity-parquet-2021            | READY |           s3://oedi-data-lake/ATB/electricity/parquet/2021/            | 2022-01-09 23:28:23+00:00 | 2022-01-09 23:28:23+00:00 |
|  5  |                 oedi-data-lake-pv-rooftop-aspects                 | READY |                s3://oedi-data-lake/pv-rooftop/aspects/                 | 2022-01-09 23:28:24+00:00 | 2022-01-09 23:28:24+00:00 |
|  6  |                oedi-data-lake-pv-rooftop-buildings                | READY |               s3://oedi-data-lake/pv-rooftop/buildings/                | 2022-01-09 23:28:24+00:00 | 2022-01-09 23:28:24+00:00 |
|  7  |            oedi-data-lake-pv-rooftop-developable-planes           | READY |           s3://oedi-data-lake/pv-rooftop/developable-planes/           | 2022-01-09 23:28:23+00:00 | 2022-01-09 23:28:23+00:00 |
|  8  |          oedi-data-lake-pv-rooftop-pr-developable-planes          | READY |         s3://oedi-data-lake/pv-rooftop-pr/developable-planes/          | 2022-01-09 23:28:23+00:00 | 2022-01-09 23:28:23+00:00 |
|  9  |                   oedi-data-lake-pv-rooftop-rasd                  | READY |                  s3://oedi-data-lake/pv-rooftop/rasd/                  | 2022-01-09 23:28:23+00:00 | 2022-01-09 23:28:23+00:00 |
|  10 |                oedi-data-lake-tracking-the-sun-2018               | READY |               s3://oedi-data-lake/tracking-the-sun/2018/               | 2022-01-09 23:28:23+00:00 | 2022-01-09 23:28:23+00:00 |
|  11 |                oedi-data-lake-tracking-the-sun-2019               | READY |               s3://oedi-data-lake/tracking-the-sun/2019/               | 2022-01-09 23:28:23+00:00 | 2022-01-09 23:28:23+00:00 |
|  12 |                oedi-data-lake-tracking-the-sun-2020               | READY |               s3://oedi-data-lake/tracking-the-sun/2020/               | 2022-01-09 23:28:24+00:00 | 2022-01-09 23:28:24+00:00 |
+-----+-------------------------------------------------------------------+-------+------------------------------------------------------------------------+---------------------------+---------------------------+
  1. Run Crawler

Use the following command to run one of the glue crawlers that will populate the table in your staging bucket. Be aware that this step will result in a charge to your AWS account, depending on the size of the table. Currently, most of the glue crawlers will cost less than $1 to run, with the exception building-stock which might be more like $10.

(oedi) $ oedi aws run-crawler -n oedi-data-lake-tracking-the-sun-2018

If you want it runs in background, use option --background-run or -b:

(oedi) $ oedi aws run-crawler -n oedi-data-lake-tracking-the-sun-2018 -b
  1. Run Crawlers

The run-crawler command can only run one crawler. If you want to run all of crawlers at once, you can use run-crawlers, which will start all available crawlers in data lake.

(oedi) $ oedi aws run-crawlers
  1. List Databases

The crawlers populate tables that are contained within databases in your data lake. Run the following code to see a list of available databases.

(oedi) $ oedi aws list-databases
All available databaes are:
+-----+-----------------------+---------------------+
| No. |          Name         |      CreateTime     |
+-----+-----------------------+---------------------+
|  0  |        default        | 2022-01-13 17:52:42 |
|  1  |        oedi_atb       | 2022-01-09 23:28:07 |
|  2  |    oedi_buildstock    | 2022-01-09 23:28:07 |
|  3  |    oedi_pv_rooftops   | 2022-01-09 23:28:07 |
|  4  | oedi_tracking_the_sun | 2022-01-09 23:28:06 |
+-----+-----------------------+---------------------+
  1. List Tables

To view a list of tables within a given database, run the following command, specifying the database with the -d option.

(oedi) $ oedi aws list-tables -d oedi_tracking_the_sun
All available tables in [oedi_tracking_the_sun] are:
+-----+-----------------------+---------------------+
| No. |          Name         |      CreateTime     |
+-----+-----------------------+---------------------+
|  0  | tracking_the_sun_2018 | 2022-01-09 23:48:45 |
|  1  | tracking_the_sun_2019 | 2022-01-09 23:49:21 |
|  2  | tracking_the_sun_2020 | 2022-01-09 23:49:23 |
+-----+-----------------------+---------------------+

Run Queries

After the desired tables are populated in database, then you can run SQL queries via AWS Athena. In this package, we also provide a run-query command for tests. For example:

(oedi) $ oedi aws run-query -q "select * from oedi_tracking_the_sun.tracking_the_sun_2020 limit 10"
data_provider_1 data_provider_2 system_id_1 system_id_2   installation_date  system_size_dc  ...  output_capacity_inverter_3  dc_optimizer inverter_loading_ratio  battery_rated_capacity_kw  battery_rated_capacity_kwh  state
0  Arkansas State Energy Office           -9999       -9999       -9999 2010-04-29 06:00:00           2.016  ...                       -9999             0               1.178947                    -9999.0                     -9999.0     AR
1  Arkansas State Energy Office           -9999       -9999       -9999 2010-04-26 06:00:00           3.360  ...                       -9999             0               1.178947                    -9999.0                     -9999.0     AR
2  Arkansas State Energy Office           -9999       -9999       -9999 2010-04-20 06:00:00          13.440  ...                       -9999             0               1.178947                    -9999.0                     -9999.0     AR
3  Arkansas State Energy Office           -9999       -9999       -9999 2010-04-21 06:00:00           5.520  ...                       -9999             0               1.210526                    -9999.0                     -9999.0     AR
4  Arkansas State Energy Office           -9999       -9999       -9999 2010-04-22 06:00:00           2.530  ...                       -9999             0               1.210526                    -9999.0                     -9999.0     AR
5  Arkansas State Energy Office           -9999       -9999       -9999 2010-04-22 06:00:00           3.450  ...                       -9999             0               1.210526                    -9999.0                     -9999.0     AR
6  Arkansas State Energy Office           -9999       -9999       -9999 2010-04-21 06:00:00           3.220  ...                       -9999             0               1.210526                    -9999.0                     -9999.0     AR
7  Arkansas State Energy Office           -9999       -9999       -9999 2010-05-18 06:00:00          12.880  ...                       -9999             0               1.210526                    -9999.0                     -9999.0     AR
8  Arkansas State Energy Office           -9999       -9999       -9999 2010-06-03 06:00:00           3.360  ...                       -9999             0               1.178947                    -9999.0                     -9999.0     AR
9  Arkansas State Energy Office           -9999       -9999       -9999 2010-05-11 06:00:00           2.700  ...                       -9999             0           -9999.000000                    -9999.0                     -9999.0     AR

[10 rows x 78 columns]

The query results would be stored in the Staging Location configured in config.yaml. You can also specify this location via --output-location or -o in this command, like this:

(oedi) $ oedi aws run-query -q "select * from oedi_tracking_the_sun.tracking_the_sun_2020 limit 10" -o "s3://another-output-location/"

Commands Help

For more oedi commands information, please use --help.

(oedi) $ oedi aws --help
Usage: oedi aws [OPTIONS] COMMAND [ARGS]...

    OEDI command with AWS cloud.

Options:
    --help  Show this message and exit.

Commands:
    list-crawlers   List available crawlers.
    list-databases  List available databases
    list-tables     List available tables in database.
    run-crawler     Run crawler to populate table.
    run-crawlers    Run all crawlers in data lake.
    run-query       Run SQL query and show/export result.

Each command also has its own help page:

(oedi) $ oedi aws run-query --help
Usage: oedi aws run-query [OPTIONS]

    Run SQL query and show/export result.

Options:
    -q, --query-string TEXT      Valid SQL query string.  [required]
    -s, --staging-location TEXT  A S3 staging directory.
    -r, --region-name TEXT       AWS region name, i.e. us-west-2
    -o, --output-file PATH       Export result to CSV file.
    --head                       Show pandas DataFrame head only.  [default:False]
    --help                       Show this message and exit.