URL -> Markdown
Krinskumar Vaghasia
Posted on September 21, 2024
Scrappy
This week, I initiated my own open source project called scrappy. `Scrappy is a command line tool that will convert any website that can be scraped into a markdown. Its just a normal classroom project that fetches the website using a URL -> Extracts the body -> Converts the body into MD using LLM. The magic comes after this, where everyone along with me shall contribute to the project to make it better and add more functionality.
Features
- Input: The main feature is that you can convert any website into a md, For this we will need a url of the page. You can provide a URL either using a file or command line arg.
# with url in the file
scrappy files/input.txt
# with url in the args
scrappy --url https://www.senecapolytechnic.ca/cgi-bin/subject?s1=OSD600
or
scrappy -u https://www.senecapolytechnic.ca/cgi-bin/subject?s1=OSD600
-
Output: The convert md can be stored in a preferred file if the file is passed using
-0
flag.# the md will be saved in the output.md in this case scrappy files/input.txt -0 files/output or scrappy files/input.txt --output files/output # no output file will make a new file in the same dir scrappy files/input.txt
Example
To use the tool, you just have to call it with the input file that contains the link, the output file is optional, This will fill the input file the md of the link that the file contains as shown in the gif below.
Posted on September 21, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.