Data Analysis with Pandas

2 minute read

In this article, we will apply Data manipulation techniques on a CSV dataset using popular python package called pandas

To learn more about pandas you can refer Cheatsheet

Load some sample Data

Result 2: sample data

import pandas as import pd
df=pd.read_csv(''https://raw.githubusercontent.com/rasbt/python_reference/master/Data/some_soccer_data.csv'')
#print first Elements from the CSV
df.head(10)

Renaming Column names to lowercase

df.columns=[c.lower() for c in df.columns]
df.tail(3)

Renaming particular columns

df=df.rename(columns={'p':'points', 'gp':'games','sot':'shots on target','g':'goals','ppg':'points_per_game','a':'assists'})
df.tail(3)

Changing Values in columns

#Processing 'salary' column
df['salary']=df['salary'].apply(lambda x:x.strip('$m'))
df.tail()

Adding a new column

df['team']=pd.Series('',index=df.index)

#or
df.insert(loc=8,column='position',value='')
df.tail(3)

Applying Functions to multiple Columns (Lowercasing Multple Columns )

cols=['player', 'position', 'team']
df[cols]=df[cols].applymap(lambda x:x.lower())
df.head()

Missing values aka NaNs (Not a Number)

Counting Rows with NaNs

nans=df.shape[0]-df.dropna().shape[0]
print('%d rows have missing values' % nans)

Selecting NaN Rows

#selecting all rows that have NaN's in the 'assists' column
df[df['assists'].isnull()]

Selecting non-NaN rows

df[df['assits'].notnull()]

Filling NaN Rows with value 0

df.fillna(value=0, inplace=True)
print(df)

Filling cells with the Data

df.loc[df.index[-1],'player']='new player'
df.loc[df.index[-1],'salary']=12.3
df.tail(3)

Sorting and Reindexing Data Frames

#sorting the Dataframe by a certain column (from highest to lowest)
df.sort_values('goals', ascending=False, inplace=True)
df.head()

Updating columns

df_2=df.copy()
df_2.loc[0:2, 'salary']=[20.0,15.0]
df_2.head(3)

Chaining Conditions-Using Bitwise Operators

#listing players from arsenal and chelsea Teams
df[(df['team']=='arsenal')|(df['team']=='chelsea')]

# selecting forwards from arsenal only
df[(df['team']=='arsenal')&(df['position']=='forward')]

if-tests

ii-test in pandas, create an array of 1’s and 0’s depending on condition. e.g, if val less than 0.5 value is set to 0, else value is set to 1. since True and false are integers after all

int(True)

import pandas as pd
a=[[2.,.3,4.,5.],[.8,.03,0.02,5.]]
df=pd.DataFrame(a)
print(df)

df1=df<=0.05
print(df1)