Skip to main content

Google Sheets minimal example

info
The source code for this example can be found in our repository at: https://github.com/dlt-hub/dlt/tree/devel/docs/examples/google_sheets.

TLDR

This example demonstrates how to load Google Sheets data using Python and the dlt library. It covers working with Google API, using built in credentials, using union of credentials, and creating dynamically generated resources.

Setup: Running this example on your machine

# clone the dlt repository
git clone git@github.com:dlt-hub/dlt.git
# go to example directory
cd ./dlt/docs/examples/google_sheets
# install dlt with duckdb
pip install "dlt[duckdb]"
# run the example script
python google_sheets.py

Google Sheets data pipeline

In this example, you'll find a Python script that demonstrates how to load Google Sheets data using the dlt library.

We'll learn how to:

tip

This example is for educational purposes. For best practices, we recommend using Google Sheets verified source.

Install Google client library

 pip install google-api-python-client

Loading code

from typing import Any, Iterator, Sequence, Union, cast

from googleapiclient.discovery import build

import dlt
from dlt.common.configuration.specs import (
GcpOAuthCredentials,
GcpServiceAccountCredentials,
)
from dlt.common.typing import DictStrAny, StrAny

def _initialize_sheets(
credentials: Union[GcpOAuthCredentials, GcpServiceAccountCredentials]
) -> Any:
# Build the service object.
service = build("sheets", "v4", credentials=credentials.to_native_credentials())
return service

@dlt.source
def google_spreadsheet(
spreadsheet_id: str,
sheet_names: Sequence[str],
credentials: Union[
GcpServiceAccountCredentials, GcpOAuthCredentials, str, StrAny
] = dlt.secrets.value,
) -> Any:
sheets = _initialize_sheets(cast(GcpServiceAccountCredentials, credentials))

def get_sheet(sheet_name: str) -> Iterator[DictStrAny]:
# get list of list of typed values
result = (
sheets.spreadsheets()
.values()
.get(
spreadsheetId=spreadsheet_id,
range=sheet_name,
# unformatted returns typed values
valueRenderOption="UNFORMATTED_VALUE",
# will return formatted dates
dateTimeRenderOption="FORMATTED_STRING",
)
.execute()
)

# pprint.pprint(result)
values = result.get("values")

# yield dicts assuming row 0 contains headers and following rows values and all rows have identical length
for v in values[1:]:
yield {h: v for h, v in zip(values[0], v)}

# create resources from supplied sheet names
return [
dlt.resource(get_sheet(name), name=name, write_disposition="replace")
for name in sheet_names
]

Run the pipeline

if __name__ == "__main__":
pipeline = dlt.pipeline(destination="duckdb")
# see example.secrets.toml to where to put credentials
sheet_id = "1HhWHjqouQnnCIZAFa2rL6vT91YRN8aIhts22SUUR580"
range_names = ["hidden_columns_merged_cells", "Blank Columns"]
# "2022-05", "model_metadata"
info = pipeline.run(
google_spreadsheet(
spreadsheet_id=sheet_id,
sheet_names=range_names,
)
)
print(info)

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.