My Experiences with Databases

Oracle,MySQL,SQL SERVER,Python,Azure,AWS,Oracle Cloud,GCP Etc

  • Enter your email address to follow this blog and receive notifications of new posts by email.

  • Total Views

    • 500,316 hits
  • $riram $anka


    The experiences, Test cases, views, and opinions etc expressed in this website are my own and does not reflect the views or opinions of my employer. This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content.Product and company names mentioned in this website may be the trademarks of their respective owners.

Archive for the ‘WebScraping’ Category

Web-Scraping 🐍 – Part 2 – Download scripts from code.activestate.com with Python -Pagination

Posted by Sriram Sanka on October 22, 2022


In my Previous post, we tried to get the blog entries as a file into a directory using web-scraping., Now lets read a web Page Entries and save the links(and content 🙂 ) as files. One can extract the Content from a web Page by reading/validating the tags as needed. In this post we are going to observe the URL Pattern for Reading and downloading the files from code.activestate.com.

code.activestate.com is one of best source to learn Python. It has around 4K+ Scripts available. lets take a look at the source.

Lets Invoke the URL https://code.activestate.com/recipes/langs/python/ in the browser & Jupiter to get the source of the webpage.

We have around 4500+ Scripts from 230 Pages, when you navigate through Pages you can see the URL gets appended with Page id as “/?page=1” at the end.

import requests
from bs4 import BeautifulSoup
import string
url = 'https://code.activestate.com/recipes/langs/python/?page=1'
reqs = requests.get(url)<br>soup = BeautifulSoup(reqs.text, 'html.parser')
print(soup)

If you are not sure how to generate Python Sample Code ,Try with postman as below to get the code Snippet.

You can see the Pattern in the Output.

Take a look at the first link , It reads as https://code.activestate.com/recipes/580811-uno-text-based/?in=lang-python and the Download link reads as https://code.activestate.com/recipes/580811-uno-text-based/download/1/

To Read all the scripts from all the Pages, we can pass the Page number at the end using a simple for loop and we also need to replace /?in=lang-python with /download/1/ in the URL and Append https://code.activestate.com/ as a prefix to the resulted.

for x in range(1, 250, 1):
    try:
        reqs = requests.get("https://code.activestate.com/recipes/langs/python/?page="+str(x))
        soup = BeautifulSoup(reqs.text, 'html.parser')
        for link2 in soup.select(" a[href]"):
            if "lang-python" in link2["href"]:
                src=link2["href"].replace("/recipes","https://code.activestate.com/recipes").replace("/?in=lang-python","/download/1/")
                tit =link2.get_text().replace(string.punctuation, " ").translate(str.maketrans('', '', string.punctuation))
                print(tit.replace(" ","_"),src)
                Download_active_state_files("c:\\Users\\Dell\\Downloads\\blogs\\",src,'UTF-8',tit.replace(" ","_"))
    except:
        pass

here is the complete Code to download all the scripts as .py in the given Directory.

import requests
from bs4 import BeautifulSoup
import string
import os
import urllib.request, urllib.error, urllib.parse
import sys
 

def Download_active_state_files(path,url,enc,title):
    try:                
        response = urllib.request.urlopen(url)
        webContent = response.read().decode(enc)
        os.makedirs(path+'\\'+ 'Code_Active_state', exist_ok=True)
        n=os.path.join(path+'\\'+ 'Code_Active_state',title +'.py')
        f = open(n, 'w',encoding=enc)
        f.write(webContent)
        f.close
    except:
        n1=os.path.join(path+'\\'+  'Code_Active_state_'+'Download_Error.log')
        f1 = open(n1, 'w',encoding=enc) 
        f1.write(url)
        f1.close
for x in range(1, 250, 1):
    try:
        reqs = requests.get("https://code.activestate.com/recipes/langs/python/?page="+str(x))
        soup = BeautifulSoup(reqs.text, 'html.parser')
        for link2 in soup.select(" a[href]"):
            if "lang-python" in link2["href"]:
                src=link2["href"].replace("/recipes","https://code.activestate.com/recipes").replace("/?in=lang-python","/download/1/")
                tit =link2.get_text().replace(string.punctuation, " ").translate(str.maketrans('', '', string.punctuation))
                print(tit.replace(" ","_"),src)
                Download_active_state_files("c:\\Users\\Dell\\Downloads\\blogs\\",src,'UTF-8',tit.replace(" ","_"))
    except:
        pass

You can Compare the files downloaded with the Web Page version.

Hope you like it. 🙂

Advertisement

Posted in POSTMAN, Python, WebScraping | Leave a Comment »

Fun with Python – Create Web Traffic using selenium & Python.

Posted by Sriram Sanka on October 4, 2022


In my Previous Post, I tried to Download the Content from the Blogs and store it in the File System, This Increased my Google Search Engine Stats and Blog traffic as well.

As you can See Max views are from Canada & India. With this, I thought of writing a Python Program to create traffic to my blog by reading my posts (so far ) using Selenium and Secure VPN.As I am connected to Canada VPN, you can see the Views below, Before and after .

In General QA Performs the same for App Testing Automation using selenium web driver and JS etc, Here I am using Python. Lets see the Code Part.

import codecs
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
import time
import os
import pandas as pd
import requests
from lxml import etree
import random
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

options = Options()
options.add_argument("start-maximized")
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--ignore-certificate-errors-spki-list')
options.add_argument('--ignore-certificate-errors')

options.add_argument("--incognito")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

def create_traffic(url):
        website_random_URL = url
        driver.get(url)
        time.sleep(5)
        height = int(driver.execute_script("return document.documentElement.scrollHeight"))
        driver.execute_script('window.scrollBy(0,10)')
        time.sleep(10)
        

    
    

main_sitemap = 'https://ramoradba.com/sitemap.xml'
xmlDict = []
r = requests.get(main_sitemap)
root = etree.fromstring(r.content)
print ("The number of sitemap tags are {0}".format(len(root)))
for sitemap in root:
    children = sitemap.getchildren()
    xmlDict.append({'url': children[0].text})
    with open('links23.txt', 'a') as f:
        f.write( f'\n{children[0].text}')
        

pd.DataFrame(xmlDict)        
col_name = ['url']
df_url = pd.read_csv("links23.txt", names=col_name)
for row in df_url.url:
    print(row)
    create_traffic(row)            

This Part is the Main Block, reading through my Blog post URL from the Links downloaded from the sitemap.

def create_traffic(url):
        website_random_URL = url
        driver.get(url)
        time.sleep(5)
        height = int(driver.execute_script("return document.documentElement.scrollHeight"))
        driver.execute_script('window.scrollBy(0,10)')
        time.sleep(10)

This Code Opens the URL in the Browser and scroll, The same can be configured using while loop forever by reading random posts from the Blog URL instead of all the Posts.

The More you execute the program, You will get more Traffic. Hope you like it.

Lets Connect to Ukraine, Kyiv and get the views from there.

Lets Execute and see the Progress…..

After Execution, Views from Ukraine have been Increased.

Follow me for more Interesting post in future , Twitter – TheRamOraDBA linkedin-ramoradba

Posted in Python, Selenium, Web-Traffic, WebScraping | Tagged: , , , | Leave a Comment »

Hacking – FATDBA.COM ¯\_(ツ)_/¯ 

Posted by Sriram Sanka on October 4, 2022


#Python #WebScraping

Just Kidding !!! Its not Hacking, this is known as WEB-SCRAPING using the Powerful Python.

What is web scraping?

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database.

You just need a Browser, Simple & a small Python Code to get the content from the Web. First Lets see the Parts of the Code and Verify.

Step 1 : Install & Load the Python Modules

import time
import os
import pandas as pd
import requests
from lxml import etree
import random
import urllib.request, urllib.error, urllib.parse
import urllib.parse
import sys
import urllib.request
import string

Step 2: Define function to get the Name of the Site/Blog to Make it as a Folder.

def get_host(url,delim):
    parsed_url = urllib.parse.urlparse(url)
    return(parsed_url.netloc.replace(delim, "_"))

Step 3: Define a Function to Get the Blog/Page Title

def findTitle(url,delim):
    webpage = urllib.request.urlopen(url).read()
    title = str(webpage).split('<title>')[1].split('</title>')[0]
    return title.replace(delim, "_").translate(str.maketrans('', '', string.punctuation))

Step 4: Define a Function to Generate a Unique string of a given length

def unq_str(len):
    N = len
    res = ''.join(random.choices(string.ascii_uppercase + string.digits, k=N))
    return(str(res))

Step 5: Write the Main Block to Download the Content from the Site/Blog

def Download_blog(path,url,enc):
    try:   
        response = urllib.request.urlopen(url)
        webContent = response.read().decode(enc)
        os.makedirs(path+'\\'+ str(get_host(url,".")), exist_ok=True)
        n=os.path.join(path+'\\'+ str(get_host(url,".")),findTitle(url," ") +'.html')
        f = open(n, 'w',encoding=enc)
        f.write(webContent)
        f.close
    except:
        n1=os.path.join(path+'\\'+  str(get_host(url,"."))+'Download_Error.log')
        f1 = open(n1, 'w',encoding=enc) 
        f1.write(url)
        f1.close

Step 6: Define Another Function to save the Blog posts into a file & Invoke the Main block to get the Blog Content.

def write_post_url_to_file(blog,path):        
    main_sitemap = blog+'/sitemap.xml'
    r = requests.get(main_sitemap)
    root = etree.fromstring(r.content)
    for sitemap in root:
        children = sitemap.getchildren()
        with open(str(path+'\\'+get_host(blog,".")) +'_blog_links.txt', 'a') as f:
            f.write( f'\n{children[0].text}')
    col_name = ['url']
    df_url = pd.read_csv(str(path+'\\'+get_host(blog,".")) +'_blog_links.txt', names=col_name)
    for row in df_url.url:
        print(row)
        Download_blog(path,row,'UTF-8')
        
write_post_url_to_file("https://fatdba.com","c:\\Users\\Dell\\Downloads\\blogs\\")


This will create a file with links and folder with blog name to store all the content/Posts Data.

Sample Output as follows

BOOM !!!

For more interesting posts you can follow me @ Twitter – TheRamOraDBA & linkedin-ramoradba

Posted in download_blogs, Linux, Python, WebScraping, Windows | Tagged: , , , | Leave a Comment »

 
Tales From A Lazy Fat DBA

Its all about Databases & their performance, troubleshooting & much more .... ¯\_(ツ)_/¯

Thinking Out Loud

Michael T. Dinh, Oracle DBA

Notes On Oracle

by Mehmet Eser

Oracle Diagnostician

Performance troubleshooting as exact science

deveshdba

get sum oracle stuffs

Data Warehousing with Oracle

Dani Schnider's Blog

ORASteps

Oracle DBA's Daily Work

DBAspaceblog.com

Welcome everyone!! The idea of this blog is to help the DBA in their daily tasks. Enjoy.

Anand's Data Stories

Learn. Share. Repeat.

Tanel Poder's blog: Core IT for geeks and pros

Oracle Performance Tuning, Troubleshooting, Internals

Yet Another OCM

Journey as an Oracle Certified Master

DBAtricksWorld.com

Sharing Knowledge is ultimate key to Gaining knowledge...

Neil Chandler's DB Blog

A resource for Database Professionals

DBA Kevlar

Tips, tricks, (and maybe a few rants) so more DBA's become bulletproof!

OraExpert Academy

Consulting and Training

%d bloggers like this: