What is a tensor
Stephen Collins
Posted on March 11, 2023
Used as the building block of machine learning models, a tensor is an N-dimensional container of data. Two very common Python machine learning libraries, PyTorch and Tensorflow, have their own object-oriented abstractions of tensors that are used to build PyTorch and Tensorflow machine learning models, respectively. In this blog post, we will cover basic tensor operations (in the context of the Python PyTorch and Tensorflow libraries), along with examples of using tensors in our own machine learning code here at Crypto Clamor.
Basic tensor operations
Tensors in both PyTorch and Tensorflow have a variety of operations available to use in various computations. One of these common operations makes use of operator overloading.
Operator overloading
Tensors in both PyTorch and Tensorflow make heavy use of Python's operator overloading functionality. Operator overloading is the ability of a programming language to override default operators (e.g., "+" and "-") to provide custom functionality to a class.
For example, to "add 1" to a tensor with Tensorflow with 1 dimension:
import tensorflow as tf
# initialize a 1-D tensor
rank_1_tensor = tf.constant([1,1,1])
# add "1" to each element in the rank_1_tensor
x = rank_1_tensor + 1
print(x)
# log output:
# tf.Tensor([2 2 2], shape=(3,), dtype=int32)
and in PyTorch:
import torch
# initialize a 1-D tensor
rank_1_tensor = torch.tensor([1,1,1])
# add "1" to each element in the rank_1_tensor
x = rank_1_tensor + 1
print(x)
# log output:
# tensor([2, 2, 2])
Examples of tensor usage in a machine learning model
Here at Crypto Clamor, one example of using tensors we can share is from how we initially fine-tuned our BERT model. We used a labeled (meaning, tweet text with associated sentiment score) csv data file to load into a Tensorflow dataset. This dataset we are using is a iterable data structure, where each element is a batch and each batch is a tuple of tensors.
import tensorflow as tf
from transformers import BertTokenizer, TFBertForSequenceClassification, InputFeatures
import pandas as pd
import matplotlib.pyplot as plt
import tarfile
import os
CSV_PATH = './labeled_model_data.csv'
NUM_EPOCHS = 10
DATASET_SIZE = 2500
BATCH_SIZE = 25
AUTOTUNE = tf.data.experimental.AUTOTUNE
dataset = tf.data.experimental.make_csv_dataset(
CSV_PATH,
batch_size=BATCH_SIZE,
column_names=['score','timestamp', 'datestring', 'N/A', 'user', 'tweet'],
label_name='score',
select_columns=['score', 'tweet'],
num_epochs=NUM_EPOCHS,
header=False,
shuffle_seed=0,
shuffle=True,
num_rows_for_inference=1600000,
ignore_errors=True,).prefetch(AUTOTUNE)
Here, we are creating a Tensorflow dataset using the experimental API's make_csv_dataset
function. The output stored in our dataset
variable is a Tensorflow Dataset
, where each element in the tensorflow dataset is a batch (of size set to batch_size
). Each element (that is, each batch) in the dataset is comprised of a tuple (features, labels)
where features
is a Tensor containing the corresponding feature data, and labels
is a Tensor containing the batch's corresponding label data.
Conclusion
This blog post has just barely scratched the surface of what tensors are, and especially how they are used in building machine learning models. Hopefully you've learned a thing or two from this blog post on tensors.
Questions or comments? Connect with us on Twitter, LinkedIn or Facebook!
Posted on March 11, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024