Parsing YAML file with Python

In this blog I will demonstrate how to parse YAML file with Python.

Assume we have a YAML file:

cat my_yaml_file.yaml

probes:
  -
    id: 1
    description: Check the databases having more than 5 active connections.
    issue: It were found databases with the high number of active connections.
    recommendation: Check why Customers open so many active connections. It may be
      wrong configuration or unusual application pattern.
    sql_query_o: no
    sql_query: select datname, count(1) num_of_active_connections, 'wvw' chk from
      pg_stat_activity where datname!='' and state!='idle' group by datname
      having count(1)>5 order by 2 desc
    sql_query_extra: select datname, state, client_addr, client_hostname,
      substr(query, 1, 2048) query from pg_stat_activity where state!='idle' and
      datname in ( select datname from ( select datname, count(1)
      num_of_active_sessions from pg_stat_activity where state!='idle' and
      datname!='' group by 1 having count(1)>0 ) M ) order by 1, 5
  -
    id: 2
    description: Check DB queries that take more than 30 seconds.
    issue: Long-running queries.
    recommendation: Check why the query/queries take so much time. It maybe it's
      heavy non-optimized query. Maybe it's unusual application pattern.
    sql_query_o: no
    sql_query: select now()-query_start as runtime, pid as process_id, datname as
      db_name, client_addr, client_hostname, substr(query, 1, 2048) query, 'wvw'
      chk from pg_stat_activity where state!='idle' and datname!='' and now() -
      query_start > '30 seconds'::interval order by 1 desc;
    sql_query_extra: no
  - 
    id: 3
    description: Check in the pg_stat_statements DB queries that take more than 3000 ms
    issue: Long-running queries.
    recommendation: Check why the query/queries take so much time. It may be it is a
      heavy non-optimized query. Maybe it's an unusual application pattern.
    sql_query_o: SELECT pss.userid, pss.dbid, pd.datname as db_name,
      round(pss.total_time::numeric, 2) as total_time, pss.calls,
      round(pss.mean_time::numeric, 0) as mean, substr(pss.query, 1, 2048)
      query, 'wvw' chk FROM pg_stat_statements pss, pg_database pd WHERE
      pd.oid=pss.dbid and round(pss.mean_time::numeric, 0) > 3000 ORDER BY
      round(pss.mean_time::numeric, 0) desc LIMIT 30;
    sql_query: SELECT pss.userid, pss.dbid, pd.datname as db_name,
      round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) as
      total_time, pss.calls,
      round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) as mean,
      substr(pss.query, 1, 2048) query, 'wvw' chk FROM pg_stat_statements pss,
      pg_database pd WHERE pd.oid=pss.dbid and
      round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) > 3000 ORDER BY
      round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) desc LIMIT 30;
    sql_query_extra: no

To parse it and populate an array with its data I will use the following Python code:

ptn_yaml_parser.py

import yaml

with open("/Users/dmitryr/LAB/YAML_parser/my_yaml_file.yaml", 'r') as f:
    valuesYaml = yaml.load(f, Loader=yaml.FullLoader)

n_elements=len(valuesYaml['probes'])
print('n_elements:', n_elements)

for idx in range(0, n_elements):
  print(" ",idx,":")
  print("issue:",valuesYaml['probes'][idx]['issue'])
  print("description:",valuesYaml['probes'][idx]['description'])
  print("recommendation:",valuesYaml['probes'][idx]['recommendation'])
  print("sql_query:",valuesYaml['probes'][idx]['sql_query'])
  print("sql_query_o:",valuesYaml['probes'][idx]['sql_query_o'])
  print("sql_query_extra:",valuesYaml['probes'][idx]['sql_query_extra'])

Example how it works:

n_elements: 3
  0 :
issue: It were found databases with the high number of active connections.
description: Check the databases having more than 5 active connections.
recommendation: Check why Customers open so many active connections. It may be wrong configuration or unusual application pattern.
sql_query: select datname, count(1) num_of_active_connections, 'wvw' chk from pg_stat_activity where datname!='' and state!='idle' group by datname having count(1)>5 order by 2 desc
sql_query_o: False
sql_query_extra: select datname, state, client_addr, client_hostname, substr(query, 1, 2048) query from pg_stat_activity where state!='idle' and datname in ( select datname from ( select datname, count(1) num_of_active_sessions from pg_stat_activity where state!='idle' and datname!='' group by 1 having count(1)>0 ) M ) order by 1, 5
  1 :
issue: Long-running queries.
description: Check DB queries that take more than 30 seconds.
recommendation: Check why the query/queries take so much time. It maybe it's heavy non-optimized query. Maybe it's unusual application pattern.
sql_query: select now()-query_start as runtime, pid as process_id, datname as db_name, client_addr, client_hostname, substr(query, 1, 2048) query, 'wvw' chk from pg_stat_activity where state!='idle' and datname!='' and now() - query_start > '30 seconds'::interval order by 1 desc;
sql_query_o: False
sql_query_extra: False
  2 :
issue: Long-running queries.
description: Check in the pg_stat_statements DB queries that take more than 3000 ms
recommendation: Check why the query/queries take so much time. It may be it is a heavy non-optimized query. Maybe it's an unusual application pattern.
sql_query: SELECT pss.userid, pss.dbid, pd.datname as db_name, round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) as total_time, pss.calls, round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) as mean, substr(pss.query, 1, 2048) query, 'wvw' chk FROM pg_stat_statements pss, pg_database pd WHERE pd.oid=pss.dbid and round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) > 3000 ORDER BY round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) desc LIMIT 30;
sql_query_o: SELECT pss.userid, pss.dbid, pd.datname as db_name, round(pss.total_time::numeric, 2) as total_time, pss.calls, round(pss.mean_time::numeric, 0) as mean, substr(pss.query, 1, 2048) query, 'wvw' chk FROM pg_stat_statements pss, pg_database pd WHERE pd.oid=pss.dbid and round(pss.mean_time::numeric, 0) > 3000 ORDER BY round(pss.mean_time::numeric, 0) desc LIMIT 30;
sql_query_extra: False

Conclusion:

In this blog I demonstrated how to parse YAML file using Python and populate the data into array.

Blog

Parsing YAML file with Python

Dmitry Romanoff

Join Our Newsletter. No Spam, Only the good stuff.

Related