In this tutorial we’re going to look at pandas data structures are and how to use them. We ‘ll start with an introduction to the pandas package and look at why it is important. We will begin to understand what data structures provides – particularly Pandas Series and Pandas DataFrame – and illustrate how to perform basic tasks on these data structures.
The Pandas Package
Pandas is a package that provides high performance, easy-to-use data structures and data analysis tools. It forms the building blocks of real world analytics in Python and offers powerful and flexible data manipulation features. With just a few lines of code, pandas makes data wrangling, preparation and analysis very easy.
Some of the most useful features in the pandas package are availability of dataFrames, loading data, crosstabs, aggregation, merging, sorting and basic plotting.
Pandas has two key data structures, Series and DataFrame.
Let us start by importing pandas and storing it as pd.

# import pandas in python
import pandas as pd
Pandas Series
A Series is a one-dimensional labelled array object similar to a list or a column in a table. It can hold any data type.
Let’s see how to create a pandas Series.
In this example, s is the name of the Series object being created. The pd.Series() function creates a Series with values enclosed in square brackets.
If the input value is a character, it is written within quotation marks.
s=pd.Series([8, 'data', 5.36, -23455788675342648, 'structures'])
s
0 8 1 data 2 5.36 3 -23455788675342648 4 structures dtype: object
By default, the index of a Series starts from zero.
We can explicitly specify the index with argument index= followed by the desired index values in square brackets. Here, instead of numbers, we have created an index of letters.
s=pd.Series([8, 'data', 5.36, -23455788675342648, 'structures'], index=['A', 'B', 'C', 'D', 'E'])
s
A 8 B data C 5.36 D -23455788675342648 E structures dtype: object
A Series can also be converted to a dictionary. If index values are not specified then it uses the keys of the dictionary as its index. Let’s elaborate with an example.
d is a dictionary object with 01, 02, 03 and 04 as keys and Jan, Feb, Mar and Apr as values. If function pd.Series() is provided with this dictionary object d, it converts d to a Series.
d={'01' : 'Jan', '02':'Feb','03':'Mar','04':'Apr'}
d
{'01': 'Jan', '02': 'Feb', '03': 'Mar', '04': 'Apr'}
months=pd.Series(d)
months
01 Jan 02 Feb 03 Mar 04 Apr dtype: object
Now let’s look at how to access elements from a Series.
One way of doing this is by using index and accessing it with the help of square brackets. For instance, index 04 in the Series object months corresponds to the value Apr.
We can also put conditions on values. From the same months object, if we want to access January, we can simply write months followed by square brackets and months == Jan. The output shows the index and value along with the datatype.
# Using Dictionary keys as its index
months['04']
'Apr'
# Using condition on value
months[months=='Jan']
01 Jan dtype: object
Pandas DataFrame
Let’s move on to the other data structure – pandas DataFrame, which is very commonly used by data scientists. A DataFrame is a two-dimensional labeled data structure, similar to a spreadsheet, table or dictionary Series. DataFrames can hold any type of data.
We’ll now see how to create and work with DataFrames. The object we will are create, ‘basic_salary’, contains 5 columns as follows: “First_Name’, ’Last_Name’, ‘Grade’, ‘Location’ and ‘ba’.
We’ll first create a dictionary of lists. The column names of our object will be the keys in the dictionary, whereas values will be enclosed in square brackets as a list. This dictionary is stored as data.
Now this dictionary can be passed to create a pandas DataFrame using the pd.DataFrame function.
The first argument in the function is the name of the dictionary object, followed by columns = specifying column names. The columns argument allows us to control the order of columns. By default, they are ordered alphabetically.
Similar to Series, the index of a DataFrame begins with zero.
data={'First_Name':['Alan', 'Agatha', 'Rajesh', 'Ameet', 'Neha'],
'Last_Name': ['Brown','Williams', 'Kolte', 'Mishra', 'Rao'],
'Grade': ['GR1', 'GR2', 'GR1','GR2', 'GR1'],
'Location': ['DELHI','MUMBAI', 'MUMBAI','DELHI', 'MUMBAI'],
'ba':[17990, 12390, 19250, 14780, 19235]}
basic_salary = pd.DataFrame(data, columns=['First_Name', 'Last_Name', 'Grade', 'Location','ba’])
basic_salary
First_Name Last_Name Grade Location ba 0 Alan Brown GR1 DELHI 17990 1 Agatha Williams GR2 MUMBAI 12390 2 Rajesh Kolte GR1 MUMBAI 19250 3 Ameet Mishra GR2 DELHI 14780 4 Neha Rao GR1 MUMBAI 19235
Data Management with DataFrames
Now let’s look at some data management tasks with DataFrames.
Indexing is similar to indexing in Series. If you want to change the default numeric index, we specify it with the argument index= in function pd.DataFrame. In the example here, we have changed them to letters.
basic_salary = pd.DataFrame(data, index=['A','B','C','D','E'],
columns=['First_Name', 'Last_Name', 'Grade', 'Location', 'ba'])
basic_salary
First_Name Last_Name Grade Location ba A Alan Brown GR1 DELHI 17990 B Agatha Williams GR2 MUMBAI 12390 C Rajesh Kolte GR1 MUMBAI 19250 D Ameet Mishra GR2 DELHI 14780 E Neha Rao GR1 MUMBAI 19235
What if we want to access the columns of a Pandas DataFrame? Again, there are two ways of doing it. We can either use square brackets and name of the column in quotation marks, or use the dot and column name after the object name. This will return the specified column.
# Accessing columns using dictionary ‘key’ notation basic_salary['First_Name'] # OR basic_salary.First_Name
A Alan B Agatha C Rajesh D Ameet E Neha Name: First_Name, dtype: object
Similarly, it’s also possible to access rows.
For label subsetting, we can use the .loc function. Here, we want to see all columns belonging to row with index label B. basic_salary.loc, followed by square brackets and ‘B’ before the colon mark and blank space afterwards returns all columns from that row.
You can also use integers instead of row labels by using the iloc function.
# Accessing rows using row index label
basic_salary.loc[ 'B' , : ]
First_Name Agatha Last_Name Williams Grade GR2 Location MUMBAI ba 12390 Name: B, dtype: object
Now let’s see some condition-based data slicing.
Suppose you want to see records of employees from the location MUMBAI. The first half of the example shows how to access the Location column by putting a dot after basic_salary and then specifying the condition. Note that all of this is enclosed in square brackets placed after basic_salary. This returns employees from Mumbai.
Next, we want to see rows from index B to E.
Here, we make use of the .loc method once again and give range of row labels that we want to see, separated by a colon. The output is of rows from B to E.
basic_salary[basic_salary.Location=='MUMBAI']
First_Name Last_Name Grade Location ba B Agatha Williams GR2 MUMBAI 12390 C Rajesh Kolte GR1 MUMBAI 19250 E Neha Rao GR1 MUMBAI 19235
# Slice along row indices
basic_salary.loc['B':'E']
First_Name Last_Name Grade Location ba B Agatha Williams GR2 MUMBAI 12390 C Rajesh Kolte GR1 MUMBAI 19250 D Ameet Mishra GR2 DELHI 14780 E Neha Rao GR1 MUMBAI 19235
What if you want to add a new column named ‘ms’?
Using square brackets, a new column can be easily added to a pandas DataFrame. In the example here, basic_salary followed by ms in square brackets creates a new column. Values in the new column are also provided in square brackets as a list.
# Add a new column ‘ms’
basic_salary['ms']=[16070,6630,14960,9300,15200]
basic_salary
First_Name Last_Name Grade Location ba ms A Alan Brown GR1 DELHI 17990 16070 B Agatha Williams GR2 MUMBAI 12390 6630 C Rajesh Kolte GR1 MUMBAI 19250 14960 D Ameet Mishra GR2 DELHI 14780 9300 E Neha Rao GR1 MUMBAI 19235 15200
Here is a quick recap of what we covered in this tutorial. We learned about the pandas library & pandas data structures. Specifically, we learned how to create a pandas Series and a pandas DataFrame and how to access elements, add new columns and so on

This tutorial lesson is taken from Digita Schools the Postgraduate Diploma in Data Science.