Ingesting All The Weather Data With Apache NiFi

tspannhw

Timothy Spann. ๐Ÿ‡บ๐Ÿ‡ฆ

Posted on July 11, 2020

Ingesting All The Weather Data With Apache NiFi



Ingesting All The Weather Data With Apache NiFi

Step By Step NiFi Flow

  1. GenerateFlowFile - build a schedule matching when NOAA updates weather
  2. InvokeHTTP - download all weather ZIP
  3. CompressContent - decompress ZIP
  4. UnpackContent - extract files from ZIP
  5. *RouteOnAttribute - just give us ones that are airports (${filename:startsWith('K')}). optional.
  6. *QueryRecord - XMLReader to JsonRecordSetWriter. Query : SELECT * FROM FLOWFILE WHERE NOT location LIKE '%Unknown%'. This is to remove some locations that are not identified. optional.
  7. Send it somewhere for storage. Could put PutKudu, PutORC, PutHDFS, PutHiveStreaming, PutHbaseRecord, PutDatabaseRecord, PublishKafkaRecord2* or others.

URL For All US Data

invokehttp.request.url

https://w1.weather.gov/xml/current\_obs/all\_xml.zip

Example Record As Converted JSON

[ {

"credit" : "NOAA's National Weather Service",

"credit_URL" : "http://weather.gov/",

"image" : {

"url" : "http://weather.gov/images/xml\_logo.gif",

"title" : "NOAA's National Weather Service",

"link" : "http://weather.gov"
Enter fullscreen mode Exit fullscreen mode

},

"suggested_pickup" : "15 minutes after the hour",

"suggested_pickup_period" : 60,

"location" : "Stanley Municipal Airport, ND",

"station_id" : "K08D",

"latitude" : 48.3008,

"longitude" : -102.4064,

"observation_time" : "Last Updated on Jul 10 2020, 9:55 am CDT",

"observation_time_rfc822" : "Fri, 10 Jul 2020 09:55:00 -0500",

"weather" : "Fair",

"temperature_string" : "66.0 F (19.0 C)",

"temp_f" : 66.0,

"temp_c" : 19.0,

"relative_humidity" : 83,

"wind_string" : "South at 6.9 MPH (6 KT)",

"wind_dir" : "South",

"wind_degrees" : 180,

"wind_mph" : 6.9,

"wind_kt" : 6,

"pressure_in" : 30.03,

"dewpoint_string" : "60.8 F (16.0 C)",

"dewpoint_f" : 60.8,

"dewpoint_c" : 16.0,

"visibility_mi" : 10.0,

"icon_url_base" : "http://forecast.weather.gov/images/wtf/small/",

"two_day_history_url" : "http://www.weather.gov/data/obhistory/K08D.html",

"icon_url_name" : "skc.png",

"ob_url" : "http://www.weather.gov/data/METAR/K08D.1.txt",

"disclaimer_url" : "http://weather.gov/disclaimer.html",

"copyright_url" : "http://weather.gov/disclaimer.html",

"privacy_policy_url" : "http://weather.gov/notice.html"

} ]

Source Code

https://github.com/tspannhw/ClouderaFlowManagementWorkshop/tree/main/flows

Resources

๐Ÿ’– ๐Ÿ’ช ๐Ÿ™… ๐Ÿšฉ
tspannhw
Timothy Spann. ๐Ÿ‡บ๐Ÿ‡ฆ

Posted on July 11, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Transform Your Data Flows with Apache NiFi
apachenifi Transform Your Data Flows with Apache NiFi

October 10, 2024

Apache Nifi & Registry com MinIO
apachenifi Apache Nifi & Registry com MinIO

October 26, 2024

FLaNK AI - 15 April 2024
apachenifi FLaNK AI - 15 April 2024

April 15, 2024

FLaNK-AIM Weekly 06 May 2024
apachenifi FLaNK-AIM Weekly 06 May 2024

May 6, 2024

Real-Time Irish Transit Analytics
apachekafka Real-Time Irish Transit Analytics

March 28, 2024

ยฉ TheLazy.dev

About