Extractive QA with txtai
David Mezzetti
Posted on January 28, 2021
In Parts 1 through 4, we gave a general overview of txtai, the backing technology and examples of how to use it for similarity searches. This article builds on that and extends to building extractive question-answering systems.
Install dependencies
Install txtai
and all dependencies.
pip install txtai
Create an Embeddings and Extractor instances
The Embeddings instance is the main entrypoint for txtai. An Embeddings instance defines the method used to tokenize and convert a segment of text into an embeddings vector.
The Extractor instance is the entrypoint for extractive question-answering.
Both the Embeddings and Extractor instances take a path to a transformer model. Any model on the Hugging Face model hub can be used in place of the models below.
from txtai.embeddings import Embeddings
from txtai.pipeline import Extractor
# Create embeddings model, backed by sentence-transformers & transformers
embeddings = Embeddings({"path": "sentence-transformers/nli-mpnet-base-v2"})
# Create extractor instance
extractor = Extractor(embeddings, "distilbert-base-cased-distilled-squad")
data = ["Giants hit 3 HRs to down Dodgers",
"Giants 5 Dodgers 4 final",
"Dodgers drop Game 2 against the Giants, 5-4",
"Blue Jays beat Red Sox final score 2-1",
"Red Sox lost to the Blue Jays, 2-1",
"Blue Jays at Red Sox is over. Score: 2-1",
"Phillies win over the Braves, 5-0",
"Phillies 5 Braves 0 final",
"Final: Braves lose to the Phillies in the series opener, 5-0",
"Lightning goaltender pulled, lose to Flyers 4-1",
"Flyers 4 Lightning 1 final",
"Flyers win 4-1"]
questions = ["What team won the game?", "What was score?"]
execute = lambda query: extractor([(question, query, question, False) for question in questions], data)
for query in ["Red Sox - Blue Jays", "Phillies - Braves", "Dodgers - Giants", "Flyers - Lightning"]:
print("----", query, "----")
for answer in execute(query):
print(answer)
print()
# Ad-hoc questions
question = "What hockey team won?"
print("----", question, "----")
print(extractor([(question, question, question, False)], data))
---- Red Sox - Blue Jays ----
('What team won the game?', 'Blue Jays')
('What was score?', '2-1')
---- Phillies - Braves ----
('What team won the game?', 'Phillies')
('What was score?', '5-0')
---- Dodgers - Giants ----
('What team won the game?', 'Giants')
('What was score?', '5-4')
---- Flyers - Lightning ----
('What team won the game?', 'Flyers')
('What was score?', '4-1')
---- What hockey team won? ----
[('What hockey team won?', 'Flyers')]
Posted on January 28, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.