Pandas Series and Pandas Dataframe – A Quick Tutorial

In this tutorial we’re going to look at  pandas data structures are and how to use them. We ‘ll start with an introduction to the pandas package and look at  why it is important. We will begin to understand what data structures provides – particularly Pandas Series and Pandas DataFrame – and illustrate how to perform basic tasks on these data structures.

The Pandas Package

Pandas is a package that provides high performance, easy-to-use data structures and data analysis tools. It  forms the building blocks of real world analytics in Python and offers powerful and flexible data manipulation features. With just a few lines of code, pandas makes data wrangling, preparation and analysis very easy.

Some of the most useful features in the pandas package are availability of dataFrames, loading data, crosstabs, aggregation, merging, sorting and basic plotting. 

Pandas has two key data structures, Series and DataFrame.

Let us start by importing pandas and storing it as pd.

Pandas Series and Pandas Dataframe
 # import pandas in python 
 import pandas as pd 

Pandas Series

A Series is a one-dimensional labelled array object similar to a list or a column in a table. It can hold any data type.

Let’s see how to create a pandas Series.

In this example, s is the name of the Series object being created. The pd.Series() function creates a Series with values enclosed in square brackets.

If the input value is a character, it is written within quotation marks.

 s=pd.Series([8, 'data', 5.36, -23455788675342648, 'structures'])
 s 
 0                     8
 1                  data
 2                  5.36
 3    -23455788675342648
 4            structures
 dtype: object 

By default, the index of a Series starts from zero.

We can explicitly specify the index with argument index= followed by the desired index values in square brackets. Here, instead of numbers, we have created an index of letters.

 s=pd.Series([8, 'data', 5.36, -23455788675342648, 'structures'], index=['A', 'B', 'C', 'D', 'E'])
 s 
 A                     8
 B                  data
 C                  5.36
 D    -23455788675342648
 E            structures
 dtype: object 

A Series can also be  converted to a dictionary. If index values are not specified then it uses the keys of the dictionary as its index. Let’s elaborate with an example.

d is a dictionary object with 01, 02, 03 and 04 as keys and Jan, Feb, Mar and Apr as values. If function pd.Series() is provided with this dictionary object d, it converts d to a Series.

 d={'01' : 'Jan', '02':'Feb','03':'Mar','04':'Apr'}
 d 
 {'01': 'Jan', '02': 'Feb', '03': 'Mar', '04': 'Apr'} 
 months=pd.Series(d)
 months 
 01    Jan
 02    Feb
 03    Mar
 04    Apr
 dtype: object 

Now let’s look at how to access elements from a Series.

One way of doing this is by using index and accessing it with the help of square brackets. For instance, index 04 in the Series object months corresponds to the value Apr.

We can also put conditions on values. From the same months object, if we want to access January, we can simply write months followed by square brackets and months == Jan. The output shows the index and value along with the datatype.

 # Using Dictionary keys as its index
 months['04'] 
 'Apr' 
 # Using condition on value
 months[months=='Jan'] 
 01    Jan
 dtype: object 

Pandas DataFrame

Let’s move on  to the other data structure – pandas DataFrame, which is very commonly used by data scientists. A DataFrame is a two-dimensional labeled data structure, similar to a spreadsheet, table or dictionary Series. DataFrames can hold any type of data.

We’ll now see how to create and work with DataFrames. The object we will are create, ‘basic_salary’, contains 5 columns as follows: “First_Name’, ’Last_Name’, ‘Grade’, ‘Location’ and ‘ba’.

We’ll first create a dictionary of lists. The column names of our object will be the keys in the dictionary, whereas values will be enclosed in square brackets as a list. This dictionary is stored as data.

Now this dictionary can be passed to create a pandas DataFrame using the pd.DataFrame function.

The first argument in the function is the name of the dictionary object, followed by columns = specifying column names. The columns argument allows us to control the order of columns. By default, they are ordered alphabetically.

Similar to Series, the index of a DataFrame begins with zero.

 data={'First_Name':['Alan', 'Agatha', 'Rajesh', 'Ameet', 'Neha'],
      'Last_Name': ['Brown','Williams', 'Kolte', 'Mishra', 'Rao'],
      'Grade': ['GR1', 'GR2', 'GR1','GR2', 'GR1'],
      'Location': ['DELHI','MUMBAI', 'MUMBAI','DELHI', 'MUMBAI'],
      'ba':[17990, 12390, 19250, 14780, 19235]}
 basic_salary = pd.DataFrame(data, columns=['First_Name', 'Last_Name', 'Grade', 'Location','ba’])
 basic_salary 
   First_Name Last_Name Grade Location     ba
 0       Alan     Brown   GR1    DELHI  17990
 1     Agatha  Williams   GR2   MUMBAI  12390
 2     Rajesh     Kolte   GR1   MUMBAI  19250
 3      Ameet    Mishra   GR2    DELHI  14780
 4       Neha       Rao   GR1   MUMBAI  19235 

Data Management with DataFrames

Now let’s look at some data management tasks with DataFrames.

Indexing is similar to indexing in Series. If you want to change the default numeric index, we specify it with the argument index= in function pd.DataFrame. In the example here, we have changed them to letters.

 basic_salary = pd.DataFrame(data, index=['A','B','C','D','E'], 
 columns=['First_Name', 'Last_Name', 'Grade', 'Location', 'ba'])
 basic_salary 
 First_Name Last_Name Grade Location     ba
 A       Alan     Brown   GR1    DELHI  17990
 B     Agatha  Williams   GR2   MUMBAI  12390
 C     Rajesh     Kolte   GR1   MUMBAI  19250
 D      Ameet    Mishra   GR2    DELHI  14780
 E       Neha       Rao   GR1   MUMBAI  19235 

What if we want to access the columns of a Pandas DataFrame? Again, there are two ways of doing it. We can either use square brackets and name of the column in quotation marks, or use the dot and column name after the object name. This will return the specified column.

 # Accessing columns using dictionary ‘key’ notation
 basic_salary['First_Name']
 # OR
 basic_salary.First_Name 
 A      Alan
 B    Agatha
 C    Rajesh
 D     Ameet
 E      Neha
 Name: First_Name, dtype: object 

Similarly, it’s also possible to access rows.

For label subsetting, we can use the .loc function. Here, we want to see all columns belonging to row with index label B. basic_salary.loc, followed by square brackets and ‘B’ before the colon mark and blank space afterwards returns all columns from that row.

You can also use integers instead of row labels by using the iloc function.

 # Accessing rows using row index label
 basic_salary.loc[ 'B' , : ] 
 First_Name      Agatha
 Last_Name     Williams
 Grade              GR2
 Location        MUMBAI
 ba               12390
 Name: B, dtype: object 

Now let’s see some condition-based data slicing.

Suppose you want to see records of employees from the location MUMBAI. The first half of the example shows how to access the Location column by putting a dot after basic_salary and then specifying the condition. Note that all of this is enclosed in square brackets placed after basic_salary. This returns employees from Mumbai.

Next, we want to see rows from index B to E.

Here, we make use of the .loc method once again and give range of row labels that we want to see, separated by a colon. The output is of rows from B to E.

 basic_salary[basic_salary.Location=='MUMBAI'] 
  First_Name Last_Name Grade Location     ba
 B     Agatha  Williams   GR2   MUMBAI  12390
 C     Rajesh     Kolte   GR1   MUMBAI  19250
 E       Neha       Rao   GR1   MUMBAI  19235 
 # Slice along row indices
 basic_salary.loc['B':'E'] 
      First_Name Last_Name  Grade Location     ba
 B     Agatha    Williams   GR2   MUMBAI     12390
 C     Rajesh     Kolte     GR1   MUMBAI     19250
 D     Ameet     Mishra     GR2    DELHI     14780
 E     Neha       Rao       GR1   MUMBAI     19235 

What if you want to add a new column named ‘ms’?

Using square brackets, a new column can be easily added to a pandas DataFrame. In the example here, basic_salary followed by ms in square brackets creates a new column. Values in the new column are also provided in square brackets as a list.

 # Add a new column ‘ms’ 
 basic_salary['ms']=[16070,6630,14960,9300,15200]
 basic_salary 
     First_Name  Last_Name Grade Location     ba     ms
 A       Alan     Brown    GR1    DELHI      17990  16070
 B     Agatha  Williams    GR2   MUMBAI      12390   6630
 C     Rajesh     Kolte    GR1   MUMBAI      19250  14960
 D      Ameet    Mishra    GR2    DELHI      14780   9300
 E       Neha       Rao    GR1   MUMBAI      19235  15200 

Here is a quick recap of what we covered in this tutorial.  We learned about the pandas library & pandas data structures. Specifically, we learned how to create a pandas Series and a pandas DataFrame and how to access elements, add new columns and so on

Pandas Series recap

This tutorial lesson is taken from Digita Schools the Postgraduate Diploma in Data Science.