FLaNK Stack 05 Feb 2024
Timothy Spann. πΊπ¦
Posted on February 5, 2024
05-February-2024
FLaNK Stack Weekly
Tim Spann @PaaSDev
https://www.youtube.com/@FLaNK-Stack
https://www.threads.net/@tspannhw
https://medium.com/@tspann/subscribe
Get your new Apache NiFi for Dummies!
https://www.cloudera.com/campaign/apache-nifi-for-dummies.html
https://ossinsight.io/analyze/tspannhw
Trial: https://console.us-west-1.cdp.cloudera.com/trial/register.html#/
CODE + COMMUNITY
Please join my meetup group NJ/NYC/Philly/Virtual.
http://www.meetup.com/futureofdata-princeton/
https://www.meetup.com/futureofdata-newyork/
https://www.meetup.com/futureofdata-philadelphia/
*This is Issue #123 *
https://github.com/tspannhw/FLiPStackWeekly
https://www.cloudera.com/solutions/dim-developer.html
Qualified Developers
https://www.linkedin.com/in/satya-n99999/
Articles
NiFi 2.0.0-M2 is Out!
https://medium.com/@tspann/apache-nifi-2-0-0-m2-out-314a1d4c8b20
Apache NiFi and Amazon Textract for Machine Learning
https://medium.com/@tspann/apache-nifi-and-amazon-textract-for-machine-learning-e45f4af12e68
Apache Kafka: Streams Replication Manager Replication
https://blog.cloudera.com/streams-replication-manager-prefixless-replication-part-1/
Doom on Bacteria
https://www.rockpapershotgun.com/you-can-play-doom-using-gut-bacteria-but-the-framerate-is-atrocious
Enterprises using Open Source LLM
https://venturebeat.com/ai/how-enterprises-are-using-open-source-llms-16-examples/
Flink Deep Dive
https://www.waitingforcode.com/apache-flink/apache-flink-cluster-components-deep-dive/read
A Cheat Sheet for RAG
https://blog.llamaindex.ai/a-cheat-sheet-and-some-recipes-for-building-advanced-rag-803a9d94c41b
Prompt Engineering Guides
https://github.com/dair-ai/Prompt-Engineering-Guide
https://platform.openai.com/docs/guides/prompt-engineering/six-strategies-for-getting-better-results
Hikari Connection Pool
https://medium.com/@guptadiksha88/hikari-cp-efficient-database-connection-pooling-d458c0bdf7df
LLM Prompting
https://www.infoq.com/articles/large-language-models-llms-prompting/
Incremental Iceberg
https://netflixtechblog.com/incremental-processing-using-netflix-maestro-and-apache-iceberg-b8ba072ddeeb
Gen AI Images
https://rmoff.net/2023/12/07/productivity-tools-ai-image-generators/
Java Links
https://graciano.dev/2023/08/03/weekend-reading-list-187/
IoT with MQTT & NiFi
https://www.baeldung.com/iot-data-pipeline-mqtt-nifi
CDC with NiFi and Snowflake
https://www.clearpeaks.com/change-data-capture-cdc-with-nifi-and-snowflake/
Host Apache NiFi with Docker
https://medium.com/geekculture/host-a-fully-persisted-apache-nifi-service-with-docker-ffaa6a5f54a3
Videos
Seven Videos on Real-Time Streaming
https://medium.com/@tspann/seven-videos-on-real-time-streaming-02711320afa8
Unlocking Financial Data with Real-Time Pipelines (OSACon 2023)
https://www.youtube.com/watch?v=Q7gF7m4yFi4&ab_channel=OSACon
Processing Cisco ASA Logs with CFM
https://medium.com/cloudera-inc/processing-cisco-asa-logs-with-cloudera-flow-management-f09cdf7382c3
Collecting NetFlow Records with Cloudera DataFlow
https://medium.com/cloudera-inc/collecting-netflow-records-with-cloudera-dataflow-f47d9f57c98
Events
Feb 8, 2024: NYC.
https://www.meetup.com/new-york-open-source-data-infrastructure-meetup/events/297484047/
18:00 - 18:30 Welcome: Networking & snacks
18:30 - 18:35 Kickoff: Welcome Aiven
18:35 - 19:00 A Guide to Product Experimentation (Erin Mikail Staples, LaunchDarkly)
19:00 - 19:30 Building Real-time Pipelines: A Case Study with Transit Data (Tim Spann, Cloudera)
19:30 ~ 21:00 Food & networking
Feb 20, 2024: 12-1PM EST. Virtual. Azure Data Tech Groups: DBA Fundamentals Group
https://www.meetup.com/dba-fundamentals-group/events/296855261/
Feb 28, 2024: NYC. Cloudera Meetup. Flink
https://www.meetup.com/futureofdata-princeton/events/298661947/
Feb 29, 2024: Virtual. Conf42 Python.
https://www.conf42.com/Python_2024_Tim_Spann_apache_nifi_2_processors
https://www.conf42.com/Python_2024_Karin_Wolok_nifi__kafka_risingwave_iceberg_llm
March 5, 2024: Princeton. Meetup. GenAI.
https://www.meetup.com/applied-generative-artificial-intelligence-applications/
March 15, 2024: TCF Pro. Princeton, NJ.
IT Professional Conference at Trenton Computer Festival
IEEE Information Technology Professional Conference on Friday, March 15th, 2024
https://princetonacm.acm.org/tcfpro/
April 2024: XtremeJ 2024. Virtual.
https://xtremej.dev/2023/schedule/
May 8-9, 2024: Data Summit 2024. Boston, MA.
https://www.dbta.com/DataSummit/2024/default.aspx
Cloudera Events
https://www.cloudera.com/about/events.html
More Events:
https://www.linkedin.com/pulse/schedule-2024-tim-spann--y4coe
Code
- https://github.com/tspannhw/FLaNK-python-watsonx-processor
- https://github.com/tspannhw/FLaNK-DatabaseTableSchemaRegistry
- https://github.com/tspannhw/FLaNK-CDW
- https://github.com/tspannhw/FLaNK-VectorDB
- https://github.com/tspannhw/FLaNK-RPI5
- https://github.com/tspannhw/FLaNK-EdgeAI
- https://github.com/kevinbtalbert/NiFi-Flows-Demos
- https://github.com/DataSQRL/apirag
- https://github.com/tspannhw/FLaNK-python-ExtractCompanyName-processor
- https://github.com/ThomasVitale/llm-apps-java-langchain4j
Models
- https://github.com/zhuyiche/llava-phi
- https://github.com/SkunkworksAI/BakLLaVA
- https://github.com/stanford-futuredata/ColBERT
- https://github.com/state-spaces/mamba
Data
Tools
- https://github.com/wxywb/history_rag
- https://github.com/video-db/StreamRAG
- https://github.com/Fanghua-Yu/SUPIR
- https://github.com/robocorp/robocorp
- https://github.com/danielmiessler/fabric
- https://posit-dev.github.io/great-tables/articles/intro.html
- https://github.com/huggingface/datatrove
- https://github.com/huggingface/setfit
- https://github.com/huggingface/text-generation-inference
- https://github.com/huggingface/distil-whisper
- https://github.com/huggingface/discord-bots
- https://github.com/explodinggradients/ragas
- https://github.com/willie-engelbrecht/ParseMultiLevelJSON-NiFiRecordProcessors
- https://github.com/ThomasVitale/llm-apps-java-spring-ai
- https://github.com/beehive-lab/TornadoVM
- https://www.autobackend.dev/
- https://trpc.io/docs/quickstart
- https://github.com/sqlchat/sqlchat
- https://github.com/AI4Finance-Foundation/FinGPT/tree/master/fingpt/FinGPT_Forecaster
- https://github.com/openvinotoolkit/awesome-openvino
- https://github.com/intel/openvino-ai-plugins-gimp
- https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html
- https://github.com/bes-dev/stable_diffusion.openvino
- https://github.com/samontab/llm_sentiment
- https://github.com/BMW-InnovationLab/BMW-IntelOpenVINO-Detection-Inference-API
- https://github.com/RapidAI/RapidOCR
- https://github.com/Hmm466/OpenVINO-Java-API
- https://github.com/openvinotoolkit/openvino_notebooks
- https://github.com/openvinotoolkit/openvino
- https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/267-distil-whisper-asr
- https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/264-qrcode-monster
- https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/262-softvc-voice-conversion
- https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/257-llava-multimodal-chatbot
- https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/270-sound-generation-audioldm2
- https://www.dfrobot.com/product-2778.html?tracking=65b9f32d03987
- https://intelpython.github.io/DPEP/
- https://github.com/kentontroy/cloudera_cml_llm_rag
- https://community.cloudera.com/t5/Support-Questions/SSE-Client-in-Apache-NiFi/m-p/359742
- https://github.com/PKU-YuanGroup/MoE-LLaVA
- https://github.com/collabora/WhisperFusion
- https://github.com/deepseek-ai/DeepSeek-Coder
- https://llava-vl.github.io/blog/2024-01-30-llava-1-6/
- https://github.com/mkjt2/lockbox
- https://www.pipeless.ai/
- https://github.com/pipeless-ai/pipeless
- https://github.com/dennislee22/deepspeed-train-CML
- https://github.com/microsoft/TransformerCompression
- https://github.com/microsoft/PubSec-Info-Assistant
- https://github.com/microsoft/Qcodes
- https://onnxruntime.ai/
- https://onnxruntime.ai/docs/tutorials/iot-edge/rasp-pi-cv.html#prerequisites
- https://github.com/microsoft/hummingbird
- https://microsoft.github.io/promptflow/how-to-guides/faq.html#openai-1-x-support
- https://github.com/microsoft/XmlNotepad
- https://learn.microsoft.com/en-us/openapi/kiota/overview
- https://github.com/microsoft/kiota-java
- https://metaflow.org/
- https://netflix.github.io/atlas-docs/
- https://github.com/RedWedgeX/obs-sessionize-title-updater
- https://github.com/openvinotoolkit/openvino_notebooks
- https://benchmark.clickhouse.com/
- https://blog.allenai.org/olmo-open-language-model-87ccfc95f580
- https://gpt4all.io/index.html
- https://datastrato.ai/docs/0.3.1/
- https://github.com/plasma-umass/scalene
- https://github.com/recap-build/hive-metastore-standalone
- https://github.com/naushadh/hive-metastore
- https://wimbd.apps.allenai.org/
- https://github.com/allenai/dolma
- https://github.com/umd-huang-lab/Mementos
- https://mitenmit.github.io/gpt/
- https://github.com/msasikanth/twine
- https://github.com/NVIDIA/NeMo-Guardrails
- https://github.com/xmlking/macbooksetup
- https://github.com/xmlking/ai-experiments
- https://github.com/adamcohenhillel/ADeus
- https://redpanda.com/blog/using-apache-nifi-with-redpanda-kafka
- https://github.com/seaweedfs/seaweedfs
- https://github.com/AILab-CVC/YOLO-World
- https://github.com/OpenBMB/MiniCPM
- https://github.com/Avaiga/taipy
- https://spectrum.ieee.org/non-line-of-sight-infrared
- https://github.com/highlight/highlight
- https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/256-bark-text-to-audio
- https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/257-llava-multimodal-chatbot
- https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/264-qrcode-monster
- https://github.com/openvinotoolkit/awesome-openvino
- https://rye-up.com/
- https://github.com/karpathy/ng-video-lecture
- https://lmstudio.ai/
- https://www.graalvm.org/
Β© 2020-2024 Tim Spann
Posted on February 5, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.