Handling Data in Python
Aman Gupta
Posted on January 1, 2021
In this blog, we are discussing how python handles data, what are various data types, and the data structure in it.
Python has many built-in data types and many specialized data types. we are discussing them one by one here. Let's start with built-in data types:
1. dict
A mapping object maps hashable values to arbitrary objects. Mappings are mutable objects. There is currently only one standard mapping type, the dictionary.
A dictionary’s keys are almost arbitrary values. Values that are not hashable, that is, values containing lists, dictionaries, or other mutable types (that are compared by value rather than by object identity) may not be used as keys. Numeric types used for keys obey the normal rules for numeric comparison: if two numbers compare equal (such as 1 and 1.0) then they can be used interchangeably to index the same dictionary entry. (Note, however, that since computers store floating-point numbers as approximations it is usually unwise to use them as dictionary keys.)
Dictionaries can be created by placing a comma-separated list of
key: value
pairs within braces.
for example :
{'jack': 4098, 'sjoerd': 4127}
or
{4098: 'jack', 4127: 'sjoerd'}
Dictionaries can be created by several means:
- Use a comma-separated list of key: value pairs within braces.
- Use a
dict
comprehension. - Use the type constructor.
To illustrate, the following examples all return a dictionary equal to {"one": 1, "two": 2, "three": 3}
:
a = dict(one=1, two=2, three=3)
b = {'one': 1, 'two': 2, 'three': 3}
c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))
d = dict([('two', 2), ('one', 1), ('three', 3)])
e = dict({'three': 3, 'one': 1, 'two': 2})
f = dict({'one': 1, 'three': 3}, two=2)
2. list
Lists are mutable sequences, typically used to store collections of homogeneous items (where the precise degree of similarity will vary by application).
The constructor builds a list whose items are the same and in the same order as iterable’s items. iterable may be either a sequence, a container that supports iteration, or an iterator object. If iterable is already a list, a copy is made and returned, similar to iterable[:]
. For example, list('abc')
returns ['a', 'b', 'c']
and list( (1, 2, 3) )
returns [1, 2, 3]
. If no argument is given, the constructor creates a new empty list, []
.
Lists may be constructed in several ways:
- Using a pair of square brackets to denote the empty list.
- Using square brackets, separating items with commas.
- Using a list comprehension.
- Using the type constructor.
for example :
a = []
b = [1,2,3]
c = [x for x in iterable]
d = list('abc')
e = list((1,2,3))
3. set
and frozenset
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.
There are currently two built-in set types, set and frozenset.
- The
set
type is mutable — the contents can be changed using methods like add() and remove(). Since it is mutable, it has no hash value and cannot be used as either a dictionary key or as an element of another set. - The
frozenset
type is immutable and hashable — its contents cannot be altered after it is created; it can therefore be used as a dictionary key or as an element of another set.
Sets can be created by several means:
- Use a comma-separated list of elements within braces
- Use a set comprehension
- Use the type constructor
a = {'jack', 'sjoerd'}
b = {c for c in 'abracadabra' if c not in 'abc'}
c = set()
d = set('foobar')
e = set(['a', 'b', 'foo'])
4. tuple
Tuples
are immutable sequences, typically used to store collections of heterogeneous data (such as the 2-tuples produced by the enumerate() built-in). Tuples are also used for cases where an immutable sequence of homogeneous data is needed (such as allowing storage in a set
or dict
instance).
Tuples may be constructed in a number of ways:
- Using a pair of parentheses to denote the empty tuple
- Using a trailing comma for a singleton tuple
- Separating items with commas
- Using the tuple() built-in:
The constructor builds a tuple whose items are the same and in the same order as iterable’s items. iterable may be either a sequence, a container that supports iteration, or an iterator object.
a = ()
b = ('a', )
c = ('a', 'b', 'c')
d = tuple() #return empty tuple
5. str
Textual data in Python is handled with str
objects, or strings
. Strings are immutable sequences of Unicode code points. String literals are written in a variety of ways:
Single quotes: 'allows embedded "double" quotes'
Double quotes: "allows embedded 'single' quotes".
Triple quoted: '''Three single quotes''', """Three double quotes"""
a = 'Aman'
b = "Aman"
c = '''I love python'''
d = """ I am enjoying it """
Triple quoted strings may span multiple lines - all associated whitespace will be included in the string literal.
6. bytes
or
Bytes objects are immutable sequences of single bytes. Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways.
Firstly, the syntax for bytes literals is largely the same as that for string literals, except that a b prefix is added:
Single quotes.
Double quotes.
Triple quoted.
a = b'still allows embedded "double" quotes'
b = b"still allows embedded 'single' quotes"
c = b'''3 single quotes'''
d = b"""3 double quotes"""
Only ASCII characters are permitted in bytes literals (regardless of the declared source code encoding). Any binary values over 127 must be entered into bytes literals using the appropriate escape sequence.
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256
7. bytearray
bytearray
objects are a mutable counterpart to bytes
objects.
As bytearray objects are mutable, they support the mutable sequence operations in addition to the common bytes and bytearray operations
Since 2 hexadecimal digits correspond precisely to a single byte, hexadecimal numbers are a commonly used format for describing binary data. Accordingly, the bytearray type has an additional class method to read data in that format
There is no dedicated literal syntax for bytearray objects, instead they are always created by calling the constructor:
- Creating an empty instance
- Creating a zero-filled instance with a given length
- From an iterable of integers
- Copying existing binary data via the buffer protocol
a = bytearray()
b = bytearray(10)
c = bytearray(range(20))
d = bytearray(b'Hi!')
Above we see the built-in data-types in python but there are some specialized data types available in python which are also amazing .
8. datetime
The datetime module supplies classes for manipulating dates and times.
While date and time arithmetic is supported, the focus of the implementation is on efficient attribute extraction for output formatting and manipulation.
Available Types
-
datetime
.date : An idealized naive date, assuming the current Gregorian calendar always was, and always will be, in effect. -
datetime
.time : An idealized time, independent of any particular day, assuming that every day has exactly 24*60*60 seconds. -
datetime
.datetime : A combination of a date and a time. -
datetime
.timedelta : A duration expressing the difference between twodate
,time
, ordatetime
instances to microsecond resolution. -
datetime
.tzinfo : An abstract base class for time zone information objects. These are used by thedatetime
andtime
classes to provide a customizable notion of time adjustment -
datetime
.timezone : A class that implements thetzinfo
abstract base class as a fixed offset from the UTC.
9. zoneinfo
The zoneinfo
module provides a concrete time zone implementation to support the IANA time zone database
as originally specified in PEP 615
. By default, zoneinfo
uses the system’s time zone data if available; if no system time zone data is available, the library will fall back to using the first-party tzdata
package available on PyPI
.
ZoneInfo is a concrete implementation of the datetime.tzinfo
abstract base class, and is intended to be attached to tzinfo
, either via the constructor, the datetime.replace
method or datetime.astimezone
.
For example :
>>> from zoneinfo import ZoneInfo
>>> from datetime import datetime, timedelta
>>> dt = datetime(2020, 10, 31, 12, tzinfo=ZoneInfo("America/Los_Angeles"))
>>> print(dt)
2020-10-31 12:00:00-07:00
>>> dt.tzname()
'PDT'
10. Calendar
This module allows you to output calendars like the Unix cal program, and provides additional useful functions related to the calendar. By default, these calendars have Monday as the first day of the week, and Sunday as the last (the European convention). Use setfirstweekday()
to set the first day of the week to Sunday (6) or to any other weekday.
There are also some types available in calendar
.
-
calendar
.Calendar : Creates a Calendar object. -
calendar
.TextCalendar : This class can be used to generate plain text calendars. -
calendar
.HTMLCalendar : This class can be used to generate HTML calendars.
There are also some more data-types in python which are much advance like Collections
, heapq
, bisect
, array
etc.
Thanks for reading.
Posted on January 1, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.