Cloudquery - Getting Started
Pedro Garcia Rodriguez
Posted on March 31, 2023
CloudQuery is a powerful tool that allows you to extract, normalize, expose and export the configuration and metadata of infrastructure resources deployed in different cloud computing providers.
Table of Content:
What is CloudQuery?
CloudQuery is a powerful tool that allows you to extract, normalize, expose and export the configuration and metadata of infrastructure resources deployed in different cloud computing providers.
The result of the operations performed by CloudQuery are stored in a Postgres database, allowing you to write SQL queries to facilitate monitoring, governance and security.
To do so, CloudQuery abstracts different APIs that allow you to define security, governance, cost and SQL compliance policies.
Why use CloudQuery?
CloudQuery gives unprecedented power and visibility to your cloud infrastructure and SaaS applications in a normalized way that is accessible with SQL in order to do security and compliance, cloud inventory, asset management and auditing tasks.
Installing CloudQuery on Mac
To install Vagrant on your Mac using Homebrew, you can follow these steps:
Make sure you have Homebrew installed on your Mac by entering the following command in the Terminal:
brew --version
If Homebrew is not installed on your Mac, follow the instructions available in the Documentation of Homebrew.
Before installing CloudQuery, enter the following command to add the CloudQuery repositories
brew tap cloudquery/tap
To install CloudQuery using Homebrew, enter the following command in the Terminal:
brew install cloudquery/tap/cloudquery
This command will install CloudQuery and its dependencies.
Once the installation is complete, verify that the CloudQuery installation was successful by entering the following command in the Terminal:
breaking_pitt@Converge~ cloudquery --version
cloudquery version 0.20.3
This should display the version of CloudQuery installed on your Mac.
Setup PostgreSQL database.
As mentioned at the beginning of this article, the information retrieved by CloudQuery is exported to a PostgreSQL database, so you are going to need one.
The easiest way is to use a Docker container.
breaking_pitt@Converge~ mkdir -p cloudquery/test/db
breaking_pitt@Converge~ docker run -d \
--name cloudquery-db \
-p 5432:5432 \
-e POSTGRES_PASSWORD=cl09dqu3r1 \
-e PGDATA=/var/lib/postgresql/data/pgdata \
-v ~/cloudquery/test/db:/var/lib/postgresql/data \
postgres
Now that you have the CloudQuery tool installed on your machine, create a directory and initialize the CloudqQuery configuration inside it, with the following command:
breaking_pitt@Converge~ mkdir -p cloudquery/test && cd .cloudquery/test
breaking_pitt@Converge~ cloudquery --no-telemetry init aws
The init command will generate a config.hcl file that will describe which cloud provider you want to use and which resources you want CloudQuery to ETL.
If you do not modify anything CloudQuery will bring all the resources of all the subscriptions to which have access.
By default, CloudQuery connects to the PostgreSQL database that is defined in the config.hcl connection section, as we’ve changed the default database name in our PostgreSQL server set up, you must edit this section to configure the location and credentials of your PostgreSQL database.
cloudquery {
...
...
connection {
dsn = "host=localhost user=postgres password=cl09dqu3r13 database=cloudquery-db port=5432"
}
}
Now you have our project ready to retrieve the information of our subscriptions.
Authenticating with AWS
CloudQuery needs to be authenticated with your AWS account in order to gather all the information about your cloud setup. There are multiple ways to authenticate with AWS, by default,
CloudQuery respects the AWS credential provider chain, this means that CloudQuery will follow the following priorities when attempting to authenticate:
- Environment variables.
- Configuration file.
- IAM roles for AWS compute resources.
Gather and dump data into PostgreSQL with CloudQuery.
Once you have customized your config.hcl and you are authenticated with AWS, run the following command to fetch the resources.
breaking_pitt@Converge~ cloudquery fetch
Once the process is finished, you can see that you have a bunch of tables generated. You can connect to your PostgreSQL server to check if CloudQuery found information about your cloud infrastructure.
breaking_pitt@Converge~ psql "postgres://postgres:cl09dqu3r13@localhost:5432/cloudquery-db?sslmode=disable"
cloudquery-db=# \d
postgres=# \d
List of relations
Schema | Name | Type | Owner
--------+----------------------------------------------------------------+-------+----------
public | aws_access_analyzer_analyzer_finding_sources | table | postgres
public | aws_access_analyzer_analyzer_findings | table | postgres
public | aws_access_analyzer_analyzers | table | postgres
public | aws_accounts | table | postgres
public | aws_acm_certificates | table | postgres
public | aws_apigateway_api_keys | table | postgres
public | aws_apigateway_client_certificates | table | postgres
public | aws_apigateway_domain_name_base_path_mappings | table | postgres
public | aws_apigateway_domain_names | table | postgres
public | aws_apigateway_rest_api_authorizers | table | postgres
public | aws_apigateway_rest_api_deployments | table | postgres
public | aws_apigateway_rest_api_documentation_parts | table | postgres
public | aws_apigateway_rest_api_documentation_versions | table | postgres
public | aws_apigateway_rest_api_gateway_responses | table | postgres
public | aws_apigateway_rest_api_models | table | postgres
public | aws_apigateway_rest_api_request_validators | table | postgres
public | aws_apigateway_rest_api_resources | table | postgres
public | aws_apigateway_rest_api_stages | table | postgres
public | aws_apigateway_rest_apis | table | postgres
...
Query dumped data.
Now that the cloudquery fetch command dumped all the information about the infrastructure from our AWS account into our PostgreSQL server, you can start to get information about our infrastructure, for example:
cloudquery-db=# \d aws_ec2_vpcs;
Table "public.aws_ec2_vpcs"
Column | Type | Collation | Nullable | Default
------------------+---------+-----------+----------+---------
cq_id | uuid | | not null |
cq_meta | jsonb | | |
account_id | text | | not null |
region | text | | |
arn | text | | |
cidr_block | text | | |
dhcp_options_id | text | | |
instance_tenancy | text | | |
is_default | boolean | | |
owner_id | text | | |
state | text | | |
tags | jsonb | | |
id | text | | not null |
Indexes:
"aws_ec2_vpcs_pk" PRIMARY KEY, btree (account_id, id)
"aws_ec2_vpcs_cq_id_key" UNIQUE CONSTRAINT, btree (cq_id)
Referenced by:
TABLE "aws_ec2_vpc_cidr_block_association_sets" CONSTRAINT "aws_ec2_vpc_cidr_block_association_sets_vpc_cq_id_fkey" FOREIGN KEY (vpc_cq_id) REFERENCES aws_ec2_vpcs(cq_id) ON DELETE CASCADE
TABLE "aws_ec2_vpc_ipv6_cidr_block_association_sets" CONSTRAINT "aws_ec2_vpc_ipv6_cidr_block_association_sets_vpc_cq_id_fkey" FOREIGN KEY (vpc_cq_id) REFERENCES aws_ec2_vpcs(cq_id) ON DELETE CASCADE
And run our queries to get information about the AWS account:
cloudquery-db=# SELECT account_id, region, arn, cidr_block FROM aws_ec2_vpcs WHERE region='eu-west-1' AND is_default='t';
account_id | region | arn | cidr_block
--------------+-----------+-----------------------------------------------------+---------------
635811601642 | eu-west-1 | arn:aws:ec2:eu-west-1:635811601642:vpc/vpc-13db3b77 | 172.31.0.0/16
Posted on March 31, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.