ETL Project for beginners

ETL Project for beginners

Simple ETL Web scrapping project where we extract data from a json, Transform the json data and Load it in one csv file

  • Loading JSON file

  • Transform JSON file

  • Load json file into CSV

Working json bank_data.json details

Screen Shot 2022-10-29 at 12.06.54 PM.png

Imports

import glob
import pandas as pd

Extract

We write a JSON extract function extract_from_json to read data from JSON and store it in pandas dataframe

<script>alert(1)</script>

def extract_from_json(file_to_process):
    dataframe = pd.read_json(file_to_process)
    return dataframe

Extract Function

columns=['Name','Market Cap (US$ Billion)']

def extract():
    extracted_data = pd.DataFrame(columns=columns)
    for jsonfile in glob.glob("bank_data.json"):
        extracted_data = extracted_data.append(extract_from_json(jsonfile), ignore_index=True)
    return extracted_data

In the above code we have extracted data from json and adding columns for the pandas

Now we have a csv file with name exchange_rates.csv in that we have a country code and Rates with comma seperated

,Rates
AUD,1.2970883369
BGN,1.6086527389
BRL,5.4091955914
CAD,1.2714262214
CHF,0.8860832374000001
CNY,6.4831386741
CZK,21.5101167955
DKK,6.1197565389
EUR,0.8225037013000001
GBP,0.7323984208000001

We will load exchange_rates.csv as a dataframe and find the exchange rate for British pounds with the symbol GBP, store it in the variable exchange_rate.

df = pd.read_csv('exchange_rates.csv', index_col=[0])
for index,row in df.iterrows():
    if(index == 'GBP'):
        print('Rates', row.Rates)
        exchange_rate = row.Rates

Transform

Using exchange_rate and the exchange_rates.csv file we will find the exchange rate of USD to GBP and will write a transform function that

  1. Rounds the Market Cap (US$ Billion) column to 3 decimal places

  2. Rename Market Cap (US$ Billion) to Market Cap (GBP$ Billion)

def transform(data):
    data['Market Cap (GBP$ Billion)'] = round(data['Market Cap (GBP$ Billion)'] * exchange_rate, 3)
    return data

Load

In this section we will load the transformed data into a new csv file names transformed_data.csv

def load(targetfile,data_to_load):
    data_to_load.to_csv(targetfile)

Now the coding is done we will test is it working as expected or not

We will execute all the methods(extract, transform and load) to see the output

Extract code execution

extracted_data = extract()
extracted_data.head()

extract1.png

Transform code execution

extracted_data.rename(columns={'Market Cap (US$ Billion)': 'Market Cap (GBP$ Billion)'}, inplace = True)
transformed_data = transform(extracted_data)
transformed_data.head()

transform1.png

Load code execution

load('bank_market_cap_gbp.csv',transformed_data)

Thanks for reading, Have a happy learning :)