My Experiences with Databases

Oracle,MySQL,SQL SERVER,Python,Azure,AWS,Oracle Cloud,GCP Etc

  • Enter your email address to follow this blog and receive notifications of new posts by email.

  • Total Views

    • 500,319 hits
  • $riram $anka


    The experiences, Test cases, views, and opinions etc expressed in this website are my own and does not reflect the views or opinions of my employer. This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content.Product and company names mentioned in this website may be the trademarks of their respective owners.

Python way to Download all the ASKTOM and Oracle MAG Posted by Connor McDonald at Linked In Group

Posted by Sriram Sanka on November 8, 2022


There is a Group Post By Connor on LinkedIn in Oracle Senior DBA Group, showing the links to access ASKTOM Best Posts and Oracle Magazines from https://asktom.oracle.com/pls/apex/f?p=100:9

Here is the Code Snippet that helps you to download all the Posts and Magazines as HTML files as your choice of Destination in your local file system .

Snippet To Download TOM KYTE Posts

import requests
from bs4 import BeautifulSoup
import string
import os
import urllib.request, urllib.error, urllib.parse
import sys

def Download_ASKTOM_files(path,url,enc,title):
    try:                
        response = urllib.request.urlopen(url)
        webContent = response.read().decode(enc)
        os.makedirs(path+'\\'+ 'ASKTOM', exist_ok=True)
        n=os.path.join(path+'\\'+ 'ASKTOM',title +'.html')
        f = open(n, 'w',encoding=enc)
        f.write(webContent)
        f.close
    except:
        n1=os.path.join(path+'\\'+  'ASKTOM_'+'Download_Error.log')
        f1 = open(n1, 'w',encoding=enc) 
        f1.write(url)
        f1.close
reqs = requests.get("https://asktom.oracle.com/tomkyte-blog.htm")
soup = BeautifulSoup(reqs.text, 'html.parser')
for link2 in soup.select(" a[href]"):
    src=link2["href"]
    durl='https://asktom.oracle.com/'+src
    tit =link2.get_text().replace(string.punctuation, " ").translate(str.maketrans('', '', string.punctuation))
    print(tit.replace(" ","_"),durl)
    Download_ASKTOM_files("c:\\Users\\cloudio\\Downloads\\blogs\\",durl,'UTF-8',tit.replace(" ","_"))        

Snippet to Download Magazines

import requests
from bs4 import BeautifulSoup
import string
import os
import urllib.request, urllib.error, urllib.parse
import sys

def Download_ASKTOM_files(path,url,enc,title):
    try:                
        response = urllib.request.urlopen(url)
        webContent = response.read().decode(enc)
        os.makedirs(path+'\\'+ 'ASKTOM_MAG', exist_ok=True)
        n=os.path.join(path+'\\'+ 'ASKTOM_MAG',title +'.html')
        f = open(n, 'w',encoding=enc)
        f.write(webContent)
        f.close
    except:
        n1=os.path.join(path+'\\'+  'ASKTOM_MAG_'+'Download_Error.log')
        f1 = open(n1, 'w',encoding=enc) 
        f1.write(url)
        f1.close
reqs = requests.get("https://asktom.oracle.com/magazine-archive.htm")
soup = BeautifulSoup(reqs.text, 'html.parser')
for link2 in soup.select(" a[href]"):
    src=link2["href"]
    durl='https://asktom.oracle.com/'+src
    tit =link2.get_text().replace(string.punctuation, " ").translate(str.maketrans('', '', string.punctuation))
    print(tit.replace(" ","_"),durl)
    Download_ASKTOM_files("c:\\Users\\cloudio\\Downloads\\blogs\\",durl,'UTF-8',tit.replace(" ","_"))   

Hope you liked it 🙂

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

 
%d bloggers like this: