SQL for Google Sheets with DuckDB
Analyze data in Google Sheets using SQL with DuckDB
data:image/s3,"s3://crabby-images/da9de/da9de014a86099ee6a5013538c7789caa340f0c0" alt="SQL for Google Sheets with DuckDB"
Lately, I've been using DuckDB to quickly analyze data in a Google Sheet.
For example, I stumbled upon this sheet collecting salary data for Data Engineers from the /r/dataengineering subreddit and was curious about some averages.
data:image/s3,"s3://crabby-images/39788/39788f0ebd2dd6730d7b3c239891d8b602277a16" alt=""
Traditionally, I would have had to export the data as a CSV to my machine and load it into some database or data warehouse where I could run some queries over it, but with DuckDB, this is so much easier!
All I have to do is run:
SELECT *
FROM read_csv_auto('https://docs.google.com/spreadsheets/export?format=csv&id=1GuEPkwjdICgJ31Ji3iUoarirZNDbPxQj_kf7fd4h4Ro', normalize_names=True);
💡
Note the "export?format=csv" part of the URL - that's how you tell Google Sheets to give you a CSV file!
Et voilà !
data:image/s3,"s3://crabby-images/e99f6/e99f6b4d48a6a54edecb5f69bebca1f5073d685a" alt=""
The data is immediately available for number crunching! Let's say I want to know the average salary for Data Engineers in Texas.
SELECT current_job_title,
round(avg(try_cast(base_salary as int))) AS avg_salary
FROM de_salaries -- Saved results of the previous query as a table
WHERE currency = 'USD'
AND state = 'Texas'
GROUP BY current_job_title
ORDER BY avg_salary DESC;
data:image/s3,"s3://crabby-images/4ed85/4ed8521d36fb6152e5f8746feaa73848489ba1fc" alt=""
And I didn't even have to leave the terminal.
Member discussion