The ABCs (Abstract Base Classes) of Python’s Data Types

Data come in all shapes, sizes, and sources: the mighty relational database, the flashy RESTful API, the ubiquitous CSV, even collections of physical documents waiting to be brought into the light of a scanner. 

There are powerful insights to be gained from all these sources of data that can help you make informed business decisions, better understand business operations and supply chains, and more. The U.Group data science team uses Python to make sense of all the different data types and create these business insights. Not all data are treated the same in Python, however, and making sense of each data type isn’t always intuitive to beginners. Never fear, though. We’re here to provide an overview of the abstract base classes (ABCs) of Python’s data types, how we use them, and how you can, too. 

What is Python? 

Python is a general-purpose programming language developed by Guido van Rossum in the early ’90s. It’s been an increasingly popular choice for programmers in recent years, particularly in the data science/engineering arena. In fact, according to the 2019 StackOverflow developer survey, “Python is the fastest-growing major programming language today.” 

U.Group’s data science team has fully embraced Python. We have found it to be a robust language with a rich ecosystem that allows us to accomplish a variety of tasks for our customers—from machine learning and image recognition to building simple APIs or data processing pipelines. 

It’s also a great programming language for collaboration, as it’s easy for our customers to work with. One reason Python attracts new users is that it’s known as being beginner friendly. The syntax is quite human readable—phrases like “it takes care of it for you” or “things are handled under the hood” pop up often in discussion. Understanding how things work under the hood, however, unlocks tremendous potential for code readability and reusability—and to gain that understanding you’ll need to learn about how Python handles data. 

How data works in Python 

Python divides all data into two paradigms: primitive data types and complex data types, or structures. Primitive data types are the basic building blocks of data in Python. Data structures are collections of data type instances. Together, they allow us to store and work with any source of data that you have. 

Primitive Python data types are so simple that we may not even think of them as data or encounter them at all. They include numerics (like integers, floating-point numbers, and imaginary numbers), character code points (that can be translated into individual characters, like “a”), and raw bytes. 

Complex Python data types—also called structures—include collections like lists, tuples, sets, and dictionaries. Each of these structures holds a group—or collection—of raw Python data types (or other collections). Even strings, like “hello, world,” are just special Python collections. 

The different types of complex Python data 

Complex structures are easy to instantiate. You don’t need to understand them deeply to start using them to create solutions. However, much of the benefit of Python’s simple approach comes at the expense of performance. Using the wrong structures can make a program “work” but cause subtle errors, or else result in slower computation. In either scenario, implementing the wrong data structure results in lost time. 

Gaining an understanding of the different data structures and how to use them correctly can help avoid some of these issues. These are the most common data structures: 

Lists 

The list is one of the most commonly used data structures in Python, and also one of the most misunderstood. Simply put, a list is an ordered collection, or sequence, of elements. Lists are mutable, so you can add elements to them, take elements away, alter individual elements, and more. 

Python itself can confirm this: 

A list behaves like an array: its elements are ordered, you can reference its elements by their offset value from the beginning of the collection, you can iterate over their elements naturally, and more. The key difference in Python, though, is that arrays are a special kind of collection designed to hold homogenous types of data. Because of this, Python arrays have special functions that regular lists can’t do. 

Lists are useful when ordered data of unknown type are being collected, evaluated, and manipulated. 

Tuples 

A tuple is also an ordered collection, or sequence, of elements. The key difference between a tuple and a list is that tuples are immutable sequences. Once a tuple is created, it cannot be altered, only used and destroyed. Like lists, tuples can be a sequence of various types, or even just a single type (which you can call a singleton object): 

Tuples are useful when preparing data to be inserted with defined structure into other data structures, like sets. 

Sets 

Sets are unordered, unique collections of objects. They differ from lists and tuples in that their elements can’t be accessed by their offset value from the beginning of the collection (because their elements are unordered). Fortunately (because their elements are unique), you can always check for that element’s presence in a set. 

The basic set is mutable, while the frozen set is immutable. Sets typically use the `{}` constructor, but we can also use the built-in `set()` function (which takes an iterable): 

Sets are useful when you need to store unique elements without caring about their order, or when you want to reduce data to its set of unique elements. 

Dictionaries 

A dictionary is also an unordered, unique collection of objects. Unlike sets, though, each element is associated with a key. These maps of key:value pairs within the dictionary are known as its hash table. As with sets, they are implemented using the `{}` syntax. We separate each set of pairs with commas and use a colon to denote the relation between a key and a value: 

Dictionaries are useful when you need to access unique elements in a collection in arbitrary order but with perfect precision (and high speed). 

Custom Data Types 

Lists, tuples, sets, and dictionaries are built into Python. Sometimes, however, we want to extend the functionality of these structures to better serve your business needs. Fortunately, Python allows us to do this. 

The chart at the top of this document shows us that the MutableSequence object inherits from Sequence which ultimately leads to a Container object. The chart is a great resource because it’s a visual representation of how data types operate in Python. More importantly, you can use it to help build out your own objects that extend the functionality of those already provided. It’s also a good resource to facilitate “duck-typing” (allowing an object to be used in any context, as long as the object supports that behavior) by enabling you to define interfaces when other techniques like hasattr() wouldn’t be the right choice for various reasons. 

There are some other structures that exist in the Collections module beyond the built-in types that are essentially specialized versions of the built-ins (a Queue object and OrderedDict, for instance), which can be useful in certain cases.  

Sometimes, it’s even useful to combine some of the qualities of two existing structures or add additional functionality to an existing structure. In these cases, we can refer to the objects in the ABCs and build off of those. 

Final Thoughts 

It’s important to use the best data structures (and algorithms to operate on them) possible in order to write high-performance code. U.Group’s data science team carefully reviews the structures we use to make sure that the analysis and tools we build from your data are as efficient, flexible, and durable as possible. 

As we build your custom data types and structures, going through the Abstract Base Classes module (and Collections) helps to reveal the internal workings of Python’s data structures.  The information found here can be a great complement to what’s laid out in section 3 (the Data Model) of the Python reference manual. Next time, we’ll take a deeper look into how we can expand the functionality of these basic data structures to allow the use of powerful yet simple algorithms and design patterns that help in our day-to-day data engineering tasks. 


Get alerted to new job postings, events, and insights by registering for our monthly newsletter.