pandas – Python Data Analysis Library

pandas is an and open-source library used for data analysis and data manipulation in python. pandas are built top of NumPy means if you want to use pandas then you must need NumPy to operate Pandas is famous for it’s a built-in function that helps to create, manipulating and wrangle the data. it is also elegant solutions for time series data.
There is the following reason, why data scientists use pandas

  • using pandas function we can easily manage or maintain missing data.
  • Pandas use Series for one-dimensional data structure and DataFrame for two or multi-dimensional data structures.
  • It gives an efficient method to slice the data.
  • Pandas give a flexible way to merge, concatenate or reshape the data.
  • Pandas add a strong time-series tool to work with.

Pandas is very useful for data analysis and manipulation and it is very powerful and very easy to use data structure. it also quickly operate on these structures.

Series in Pandas

A series is a one-dimensional data structure. It can have any data structure. series is helpful when you want to perform a calculation or return a one-dimensional array. A series, by meaning, it doesn’t have multiple columns.
In series Data: can be a list, dictionary or scalar value.

Example:

pd.Series([1., 2., 3.])

OutPut

0 1.0
1 2.0
2 3.0

You create a Pandas series with a missing value, missing values in Python are written “NaN.” You can use NumPy to create missing value: np.nan

pd.Series([1,2,np.nan])

OutPut

0 1.0
1 2.0
2 NaN

Data Frames in Pandas:

A data frame is a two-dimensional array, with specified rows and columns. A data frame is a standard or best way to store data.
Data frame is well-known by statisticians and other data practitioners. A data frame is a tabular data, with rows to store the information.

Example

Detail = {‘Name’: [“Denial”, “John”, ‘Smith’, ‘Julia’], ‘Age’: [30, 40,30,32]}
pd.DataFrame(data=Detail)

Output
   Age    Name
0   30  Denial
1   40    John
2   30   Smith
3   32   Julia

Range Date

using Pandas We can create a Range of date.
pd.data_range(date,period,frequency):

The first parameter is the starting date. The second parameter is the number of periods.it is optional if the end date is specified.
The last parameter is the number: day: ‘D,’ month: ‘M’ and year: ‘Y.’

Example:

call_Date = pd.date_range(‘20200101′, periods=12, freq=’D’)
print(‘Day:’, call_Date)

Output
('Day:', DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04','2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08','2020-01-09', '2020-01-10', '2020-01-11', '2020-01-12'],
dtype='datetime64[ns]', freq='D'))

Inspecting data

In Pandas Head(), Tail() are a function that is used to display the set of data top of the list or bottom of the list.

Example:

import pandas as pd
import numpy as np #add numpy library as np

#create data frame[6,4]

random = np.random.randn(6,4)

Create data with normal index

df = pd.DataFrame(random,columns=list(‘ABCD’))
df.head(3) #get top 3 data from dataframe

Output
          A         B         C         D
0  0.258832 -1.062415 -0.244753 -0.621026
1  0.178227  0.230056  0.941359  1.389721
2  0.288088  0.331451  1.054055  0.976430

df.tail(3) #get top 3 data from dataframe

Output
          A         B         C         D
3 -0.001205 -0.807774 -0.287996  0.937336
4 -0.648099 -0.618422 -0.522724 -0.294018
5 -0.909344  0.701876 -0.865958  0.272972

df.describe()

              A         B         C         D
count  6.000000  6.000000  6.000000  6.000000
mean   0.496383  0.176916  0.078861 -0.832908
std    1.054616  1.479604  0.778782  1.047180
min   -0.562205 -2.351401 -0.594309 -2.122065
25%   -0.308168  0.030517 -0.436737 -1.536623
50%    0.262184  0.185599 -0.130471 -0.938388
75%    1.102466  0.696866  0.233103 -0.160913
max    2.131860  2.195588  1.512948  0.639181

Slice data

In data analysis, it is very important to get data easily. using pandas we can easily get a set of data from the data frame.

Example:

df[‘A’] #get the specific data from data frame

Output

0 -1.509128
1 -1.064392
2 -0.932734
3 0.442185
4 0.378036
5 0.624964

Drop a column

Drop () is used to drop a particular row.

Example

df.drop(columns=[‘A’, ‘D’])

Concatenation

In data analysis, it is very important that we can easily merge data or easily merge two or more data frames.
Using Concat() we can merge two data frames.

Example

import pandas as pd

df1 = pd.DataFrame({‘name’: [‘Denny’, ‘Smith’,’john’],
‘Age’: [’20’, ’22’, ’45’]},
index=[0, 1, 2])
df2 = pd.DataFrame({‘name’: [‘Julia’, ‘Smith’ ],
‘Age’: [’46’, ’41’]},
index=[3, 4])
df_concat = pd.concat([df1,df2])
df_concat

output
  Age   name
0  20  Denny
1  22  Smith
2  45   john
3  46  Julia
4  41  Smith

Drop_duplicates

If a dataset can include duplicates data use, drop_duplicates It is easy to exclude duplicate rows.

Example

df_concat.drop_duplicates(‘name’)

Output
   Age  name
0  20  Denny
1  22  Smith
2  45   john
3  46  Julia

Sort values

Example

df_concat.sort_values(‘Age’)

Output
  Age   name
0  20  Denny
1  22  Smith
2  45   john
3  46  Julia

Leave a Comment