Archive for the ‘Python’ Category

Download Youtube ( Channel ) videos using Python Module – Pytube ,scrapetube.

Posted by Sriram Sanka on April 25, 2023

In this post, I will present the code to download Youtube Individual/Channel Videos step-by-step.

You can download the Videos using pytube module by passing the video URL as an argument. Just Install pytube and run as below.

python -m pip install pytube

This will download the given video URL to your current Directory, what about if you want to store your local copy for your educational purpose when you are offline.? The following will download the all the videos from the given URL to your desired location in the PC.

import requests
import re
from bs4 import BeautifulSoup
from pytube.cli import on_progress
from pytube import YouTube
from pytube import Playlist
from pytube import Channel


def get_youtube(i):
    try:
        yt = YouTube(i,on_progress_callback=on_progress)
        yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').desc().first().download('C:\\Users\\Dell\\Videos\\downs\\Sriram_Channel\\')
    except:
        print(f'\nError in downloading:  {yt.title} from -->' + i)
        pass
l = Channel('https://www.youtube.com/channel/UCw43xCtkl26vGIGtzBoaWEw')

for video in l.video_urls:
    get_youtube(video)

You may get regular Expression error in fetching the video information due to changes implemented in the channel URL using symbols. or any other restrictions which are unsupported by the pytube module, either you have to download and install the latest or identify and fix it on your own.

To avoid this, you can use another module called scrapetube

pip install scrapetube

The following code will give you the Video id from the URL provided as input. you need to append ‘https://www.youtube.com/watch?v=’ to mark that as a complete URL.

import scrapetube

videos = scrapetube.get_channel("........t7XvGJ3AGpSon.......")

for video in videos:
    print('https://www.youtube.com/watch?v='+ video['videoId'])

as you can see channel id is the key here, what if you are unaware of channel id.? You can get the channel id either from browser view page source or you can also use “requests” and “BeautifulSoup” module to get the channel ID

import requests
from bs4 import BeautifulSoup

url = input("Please Enter youtube channel URL ")
response = requests.get(url)
soup = BeautifulSoup(response.content,'html.parser')
print(soup.find("meta", itemprop="channelId")['content'])

Instead of using Channel from pytube module, to make it easier lets make changes to the original code snippet posted above, using module “scrapetube.get_channel “

import requests
import re
from bs4 import BeautifulSoup
from pytube.cli import on_progress
from pytube import YouTube
from pytube import Playlist
from pytube import Channel
import scrapetube
import json

def get_youtube(i):
    try:
        print('Downloading ' +i)
        yt = YouTube(i,on_progress_callback=on_progress)
        #print(f'\nStarted downloading:  {yt} from -->' + i)
        yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').desc().first().download('C:\\Users\\Dell\\Videos\\downs\\Sriram_Channel\\')
    except:
        print('Error Downloading' +i)
        pass
l = scrapetube.get_channel("UCw43xCtkl26vGIGtzBoaWEw")
for video in l:
    #print('Downloading https://www.youtube.com/watch?v='+ video['videoId'])
    get_youtube('https://www.youtube.com/watch?v='+ video['videoId'])

Hope you like it ! Happy Reading.

Posted in Python, pytube, scrapetube, WebScraping, youtube | Tagged: DOWNLOAD, Python, pytube, scrapetube, web-scraping, youtube, youtube channel | Leave a Comment »

Reading Chrome Bookmarks using Python Module chrome_bookmarks

Posted by Sriram Sanka on April 24, 2023

Chrome Browser Bookmark file is in JSON format whereas history file is a database format. One can read the data file using SQLite(DB Browser) to get the URL and downloads data.

In this post I am using chrome-bookmarks module to read the bookmarks and print the URL along with it Name and Folder Information.

Install chrome-bookmarks module using pip

pip install chrome-bookmarks

Now you can read the URL from Python as below.

import chrome_bookmarks
for url in chrome_bookmarks.urls:
    print(url.url)

We Can also print the URL Description/Name as below.

import chrome_bookmarks
for url in chrome_bookmarks.urls:
    print(url.url, url.name)

import chrome_bookmarks

for folder in chrome_bookmarks.folders:
    print(folder.name)
    print(folder.folders)

It is also possible to read the JSON without using the above module.

import json
file ="C:\\Users\\Dell\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\Bookmarks"
with open(file, "r", encoding='utf-8') as bookmarks:
    bookmark_data = json.load(bookmarks)
    print(json.dumps(bookmark_data, indent=1))

{
 "checksum": "7b7d115080ddda0dcab6b428f64aa3a5",
 "roots": {
  "bookmark_bar": {
   "children": [
    {
     "date_added": "13326789370504643",
     "date_last_used": "0",
     "guid": "5c3e2329-1d0c-4a9d-b85e-426d6fd31301",
     "id": "62",
     "meta_info": {
      "power_bookmark_meta": ""
     },
     "name": "Oracle | Cloud Applications and Cloud Platform",
     "type": "url",
     "url": "https://www.oracle.com/"
    },
    {
     "date_added": "13326789428536219",
     "date_last_used": "0",
     "guid": "0a972887-7f34-41b3-bf59-1f2f2f521a0e",
     "id": "63",
     "meta_info": {
      "power_bookmark_meta": ""
     },
     "name": "Oracle Database Features",
     "type": "url",
     "url": "https://apex.oracle.com/database-features/"
    },
    {
     "date_added": "13326789439984046",
     "date_last_used": "0",
     "guid": "25e1d231-a825-46fd-a8a9-829bf07dc5bf",
     "id": "64",
     "meta_info": {
      "power_bookmark_meta": ""
     },
     "name": "Cloud Sign In",
     "type": "url",
     "url": "https://www.oracle.com/cloud/sign-in.html?redirect_uri=https%3A%2F%2Fcloud.oracle.com%2F"
    },
    {
     "date_added": "13326789456087992",
     "date_last_used": "0",
     "guid": "a0e39859-9596-47be-bc61-d6099f152fb1",
     "id": "65",
     "meta_info": {
      "power_bookmark_meta": ""
     },
     "name": "Oracle Blogs",
     "type": "url",
     "url": "https://blogs.oracle.com/"
    },
    {
     "children": [
      {
       "date_added": "13326789491584887",
       "date_last_used": "0",
       "guid": "b6f90dec-e303-4a2b-bb53-a7357c8d694f",
       "id": "67",
       "meta_info": {
        "power_bookmark_meta": ""
       },
       "name": "Oracle | Cloud Applications and Cloud Platform",
       "type": "url",
       "url": "https://www.oracle.com/"
      },
      {
       "date_added": "13326789491585797",
       "date_last_used": "0",
       "guid": "ffeea96a-87d7-4053-b674-7ac9cfb50a13",
       "id": "68",
       "meta_info": {
        "power_bookmark_meta": ""
       },
       "name": "Oracle Database Features",
       "type": "url",
       "url": "https://apex.oracle.com/database-features/"
      },
      {
       "children": [
        {
         "date_added": "13326789491586570",
         "date_last_used": "0",
         "guid": "851ce296-3918-47ab-981a-21fcab632edc",
         "id": "70",
         "meta_info": {
          "power_bookmark_meta": ""
         },
         "name": "Oracle Blogs",
         "type": "url",
         "url": "https://blogs.oracle.com/"
        },
        {
         "date_added": "13326789491586243",
         "date_last_used": "0",
         "guid": "5b21937d-66d9-4eea-9491-7d7c2e22152b",
         "id": "69",
         "meta_info": {
          "power_bookmark_meta": ""
         },
         "name": "Cloud Sign In",
         "type": "url",
         "url": "https://www.oracle.com/cloud/sign-in.html?redirect_uri=https%3A%2F%2Fcloud.oracle.com%2F"
        }
       ],
       "date_added": "13326789519632624",
       "date_last_used": "0",
       "date_modified": "13326789527363048",
       "guid": "9b0bbece-a9fd-4678-a846-896b7a2a54ec",
       "id": "71",
       "name": "Oracle_Child",
       "type": "folder"
      }
     ],
     "date_added": "13326789485096843",
     "date_last_used": "0",
     "date_modified": "13326789519632800",
     "guid": "ea6663fa-05df-4751-b25b-125561aed88d",
     "id": "66",
     "name": "Oracle",
     "type": "folder"
    }
   ],
   "date_added": "13326774450786953",
   "date_last_used": "0",
   "date_modified": "13326789491586570",
   "guid": "0bc5d13f-2cba-5d74-951f-3f233fe6c908",
   "id": "1",
   "name": "Bookmarks bar",
   "type": "folder"
  },
  "other": {
   "children": [],
   "date_added": "13326774450786955",
   "date_last_used": "0",
   "date_modified": "0",
   "guid": "82b081ec-3dd3-529c-8475-ab6c344590dd",
   "id": "2",
   "name": "Other bookmarks",
   "type": "folder"
  },
  "synced": {
   "children": [],
   "date_added": "13326774450786957",
   "date_last_used": "0",
   "date_modified": "0",
   "guid": "4cf2e351-0e85-532b-bb37-df045d8f8d0f",
   "id": "3",
   "name": "Mobile bookmarks",
   "type": "folder"
  }
 },
 "version": 1
}

Hope you like it. ! Happy learning.

Posted in Python | Tagged: Python | Leave a Comment »

Sample Script to Publish a blog Post #Python

Posted by Sriram Sanka on March 29, 2023

Sample Script to Publish a blog Post Using Python

This post is Auto published using Python script attached below. #Python

import json
from wordpress_xmlrpc import Client, WordPressPost
from wordpress_xmlrpc.methods.posts import NewPost
import getpass
password = getpass.getpass(prompt='Password: ', stream=None)  //Blog Login Password 
def auto_blog_post(blog_content,blog_excerpt,blog_status):
    id = 'sriramoracle'   //User Name
    url = 'https://sriramoracle.wordpress.com/xmlrpc.php'
    wp = Client(url, id, password)
    post = WordPressPost()
    post.post_status = blog_status
    post.title = blog_excerpt
    post.content = blog_content
    post.excerpt = blog_excerpt
    post.terms_names = {
        "post_tag": ['Python'],
        "category": ['Python']
    }
    wp.call(NewPost(post))
auto_blog_post('Sample Script to Publish a blog Post Using Python ','Sample Script to Publish a blog Post ' ,'publish')  //publish will publish the Post, Draft is the default mode.

Posted in Python | Tagged: Python | Leave a Comment »

Python way to Download all the ASKTOM and Oracle MAG Posted by Connor McDonald at Linked In Group

Posted by Sriram Sanka on November 8, 2022

There is a Group Post By Connor on LinkedIn in Oracle Senior DBA Group, showing the links to access ASKTOM Best Posts and Oracle Magazines from https://asktom.oracle.com/pls/apex/f?p=100:9

Here is the Code Snippet that helps you to download all the Posts and Magazines as HTML files as your choice of Destination in your local file system .

Snippet To Download TOM KYTE Posts

import requests
from bs4 import BeautifulSoup
import string
import os
import urllib.request, urllib.error, urllib.parse
import sys

def Download_ASKTOM_files(path,url,enc,title):
    try:                
        response = urllib.request.urlopen(url)
        webContent = response.read().decode(enc)
        os.makedirs(path+'\\'+ 'ASKTOM', exist_ok=True)
        n=os.path.join(path+'\\'+ 'ASKTOM',title +'.html')
        f = open(n, 'w',encoding=enc)
        f.write(webContent)
        f.close
    except:
        n1=os.path.join(path+'\\'+  'ASKTOM_'+'Download_Error.log')
        f1 = open(n1, 'w',encoding=enc) 
        f1.write(url)
        f1.close
reqs = requests.get("https://asktom.oracle.com/tomkyte-blog.htm")
soup = BeautifulSoup(reqs.text, 'html.parser')
for link2 in soup.select(" a[href]"):
    src=link2["href"]
    durl='https://asktom.oracle.com/'+src
    tit =link2.get_text().replace(string.punctuation, " ").translate(str.maketrans('', '', string.punctuation))
    print(tit.replace(" ","_"),durl)
    Download_ASKTOM_files("c:\\Users\\....\\Downloads\\blogs\\",durl,'UTF-8',tit.replace(" ","_"))

Snippet to Download Magazines

import requests
from bs4 import BeautifulSoup
import string
import os
import urllib.request, urllib.error, urllib.parse
import sys

def Download_ASKTOM_files(path,url,enc,title):
    try:                
        response = urllib.request.urlopen(url)
        webContent = response.read().decode(enc)
        os.makedirs(path+'\\'+ 'ASKTOM_MAG', exist_ok=True)
        n=os.path.join(path+'\\'+ 'ASKTOM_MAG',title +'.html')
        f = open(n, 'w',encoding=enc)
        f.write(webContent)
        f.close
    except:
        n1=os.path.join(path+'\\'+  'ASKTOM_MAG_'+'Download_Error.log')
        f1 = open(n1, 'w',encoding=enc) 
        f1.write(url)
        f1.close
reqs = requests.get("https://asktom.oracle.com/magazine-archive.htm")
soup = BeautifulSoup(reqs.text, 'html.parser')
for link2 in soup.select(" a[href]"):
    src=link2["href"]
    durl='https://asktom.oracle.com/'+src
    tit =link2.get_text().replace(string.punctuation, " ").translate(str.maketrans('', '', string.punctuation))
    print(tit.replace(" ","_"),durl)
    Download_ASKTOM_files("c:\\Users\\......\\Downloads\\blogs\\",durl,'UTF-8',tit.replace(" ","_"))

Hope you liked it 🙂

Posted in ASKTOM, CONNOR, Python, TOMKYTE | Tagged: ASKTOM, CONNOR, DOWNLOAD, Python, TOMKYTE | Leave a Comment »

Web-Scraping 🐍 – Part 2 – Download scripts from code.activestate.com with Python -Pagination

Posted by Sriram Sanka on October 22, 2022

In my Previous post, we tried to get the blog entries as a file into a directory using web-scraping., Now lets read a web Page Entries and save the links(and content 🙂 ) as files. One can extract the Content from a web Page by reading/validating the tags as needed. In this post we are going to observe the URL Pattern for Reading and downloading the files from code.activestate.com.

code.activestate.com is one of best source to learn Python. It has around 4K+ Scripts available. lets take a look at the source.

Lets Invoke the URL https://code.activestate.com/recipes/langs/python/ in the browser & Jupiter to get the source of the webpage.

We have around 4500+ Scripts from 230 Pages, when you navigate through Pages you can see the URL gets appended with Page id as “/?page=1” at the end.

import requests
from bs4 import BeautifulSoup
import string
url = 'https://code.activestate.com/recipes/langs/python/?page=1'
reqs = requests.get(url)<br>soup = BeautifulSoup(reqs.text, 'html.parser')
print(soup)

If you are not sure how to generate Python Sample Code ,Try with postman as below to get the code Snippet.

You can see the Pattern in the Output.

Take a look at the first link , It reads as https://code.activestate.com/recipes/580811-uno-text-based/?in=lang-python and the Download link reads as https://code.activestate.com/recipes/580811-uno-text-based/download/1/

To Read all the scripts from all the Pages, we can pass the Page number at the end using a simple for loop and we also need to replace /?in=lang-python with /download/1/ in the URL and Append https://code.activestate.com/ as a prefix to the resulted.

for x in range(1, 250, 1):
    try:
        reqs = requests.get("https://code.activestate.com/recipes/langs/python/?page="+str(x))
        soup = BeautifulSoup(reqs.text, 'html.parser')
        for link2 in soup.select(" a[href]"):
            if "lang-python" in link2["href"]:
                src=link2["href"].replace("/recipes","https://code.activestate.com/recipes").replace("/?in=lang-python","/download/1/")
                tit =link2.get_text().replace(string.punctuation, " ").translate(str.maketrans('', '', string.punctuation))
                print(tit.replace(" ","_"),src)
                Download_active_state_files("c:\\Users\\Dell\\Downloads\\blogs\\",src,'UTF-8',tit.replace(" ","_"))
    except:
        pass

here is the complete Code to download all the scripts as .py in the given Directory.

import requests
from bs4 import BeautifulSoup
import string
import os
import urllib.request, urllib.error, urllib.parse
import sys
 

def Download_active_state_files(path,url,enc,title):
    try:                
        response = urllib.request.urlopen(url)
        webContent = response.read().decode(enc)
        os.makedirs(path+'\\'+ 'Code_Active_state', exist_ok=True)
        n=os.path.join(path+'\\'+ 'Code_Active_state',title +'.py')
        f = open(n, 'w',encoding=enc)
        f.write(webContent)
        f.close
    except:
        n1=os.path.join(path+'\\'+  'Code_Active_state_'+'Download_Error.log')
        f1 = open(n1, 'w',encoding=enc) 
        f1.write(url)
        f1.close
for x in range(1, 250, 1):
    try:
        reqs = requests.get("https://code.activestate.com/recipes/langs/python/?page="+str(x))
        soup = BeautifulSoup(reqs.text, 'html.parser')
        for link2 in soup.select(" a[href]"):
            if "lang-python" in link2["href"]:
                src=link2["href"].replace("/recipes","https://code.activestate.com/recipes").replace("/?in=lang-python","/download/1/")
                tit =link2.get_text().replace(string.punctuation, " ").translate(str.maketrans('', '', string.punctuation))
                print(tit.replace(" ","_"),src)
                Download_active_state_files("c:\\Users\\Dell\\Downloads\\blogs\\",src,'UTF-8',tit.replace(" ","_"))
    except:
        pass

You can Compare the files downloaded with the Web Page version.

Hope you like it. 🙂

Posted in POSTMAN, Python, WebScraping | Leave a Comment »

How-to-Install-Python-with-Anaconda & Connect with Oracle

Posted by Sriram Sanka on October 4, 2022

You can also download Python Installer Executable from https://www.python.org/downloads/windows/

With the Help of CX_ORACLE, we can connect and Execute Oracle Commands .

<strong>import pandas as pd
import pandas.io.sql as psql
import cx_Oracle
import os
os.environ["NLS_LANG"] = "AMERICAN_AMERICA.AL32UTF8"

dsn_tns = cx_Oracle.makedsn('localhost', 1521, 'xe')
ora_conn = cx_Oracle.connect('sriram','sriram',dsn=dsn_tns)
df1 = psql.read_sql('SELECT * FROM dba_users ', con=ora_conn) 
#for v in df1['USERNAME']:
#    print(v)
print("Running :", df1)
ora_conn.close()</strong>

You can use getpass to hide prompted password at command prompt.

<strong>import pandas as pd
import pandas.io.sql as psql
import cx_Oracle
import getpass
import os
os.environ["NLS_LANG"] = "AMERICAN_AMERICA.AL32UTF8"
username = input("Enter User Name: ")
userpwd = getpass.getpass(prompt='Password: ', stream=None) 

dsn_tns = cx_Oracle.makedsn('localhost', 1521, 'xe')
ora_conn = cx_Oracle.connect(username,userpwd,dsn=dsn_tns)
df1 = psql.read_sql('SELECT username,account_status FROM dba_users ', con=ora_conn) 
print("Running :", df1)
ora_conn.close()
</strong>

We can use the plot by Installing matplotlib

<strong>import pandas as pd
import pandas.io.sql as psql
import cx_Oracle
import getpass
import os
import matplotlib.pyplot as plt
os.environ["NLS_LANG"] = "AMERICAN_AMERICA.AL32UTF8"
username = input("Enter User Name: ")
userpwd = getpass.getpass(prompt='Password: ', stream=None) 

dsn_tns = cx_Oracle.makedsn('localhost', 1521, 'xe')
ora_conn = cx_Oracle.connect(username,userpwd,dsn=dsn_tns)
df1 = psql.read_sql('SELECT count(*) cnt,account_status FROM dba_users group by account_status', con=ora_conn) 

print(df1)
df1.plot(x="ACCOUNT_STATUS",y=["CNT"])
plt.show()
ora_conn.close()</strong>

<strong>import pandas as pd
import pandas.io.sql as psql
import cx_Oracle
import getpass
import os
import matplotlib.pyplot as plt

os.environ["NLS_LANG"] = "AMERICAN_AMERICA.AL32UTF8"
username = input("Enter User Name: ")
userpwd = getpass.getpass(prompt='Password: ', stream=None) 

dsn_tns = cx_Oracle.makedsn('localhost', 1521, 'xe')
ora_conn = cx_Oracle.connect(username,userpwd,dsn=dsn_tns)
df1 = psql.read_sql('SELECT count(*) cnt,account_status FROM dba_users group by account_status', con=ora_conn) 
print(df1)
df1.plot.bar(x="ACCOUNT_STATUS",y=["CNT"],rot=0)
plt.show()
ora_conn.close()</strong>

Hope you like it !!!

Posted in Installation, Linux, Python, Windows | Tagged: Python | Leave a Comment »

Fun with Python – Create Web Traffic using selenium & Python.

Posted by Sriram Sanka on October 4, 2022

In my Previous Post, I tried to Download the Content from the Blogs and store it in the File System, This Increased my Google Search Engine Stats and Blog traffic as well.

As you can See Max views are from Canada & India. With this, I thought of writing a Python Program to create traffic to my blog by reading my posts (so far ) using Selenium and Secure VPN.As I am connected to Canada VPN, you can see the Views below, Before and after .

In General QA Performs the same for App Testing Automation using selenium web driver and JS etc, Here I am using Python. Lets see the Code Part.

import codecs
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
import time
import os
import pandas as pd
import requests
from lxml import etree
import random
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

options = Options()
options.add_argument("start-maximized")
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--ignore-certificate-errors-spki-list')
options.add_argument('--ignore-certificate-errors')

options.add_argument("--incognito")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

def create_traffic(url):
        website_random_URL = url
        driver.get(url)
        time.sleep(5)
        height = int(driver.execute_script("return document.documentElement.scrollHeight"))
        driver.execute_script('window.scrollBy(0,10)')
        time.sleep(10)
        

    
    

main_sitemap = 'https://ramoradba.com/sitemap.xml'
xmlDict = []
r = requests.get(main_sitemap)
root = etree.fromstring(r.content)
print ("The number of sitemap tags are {0}".format(len(root)))
for sitemap in root:
    children = sitemap.getchildren()
    xmlDict.append({'url': children[0].text})
    with open('links23.txt', 'a') as f:
        f.write( f'\n{children[0].text}')
        

pd.DataFrame(xmlDict)        
col_name = ['url']
df_url = pd.read_csv("links23.txt", names=col_name)
for row in df_url.url:
    print(row)
    create_traffic(row)

This Part is the Main Block, reading through my Blog post URL from the Links downloaded from the sitemap.

def create_traffic(url):
        website_random_URL = url
        driver.get(url)
        time.sleep(5)
        height = int(driver.execute_script("return document.documentElement.scrollHeight"))
        driver.execute_script('window.scrollBy(0,10)')
        time.sleep(10)

This Code Opens the URL in the Browser and scroll, The same can be configured using while loop forever by reading random posts from the Blog URL instead of all the Posts.

The More you execute the program, You will get more Traffic. Hope you like it.

Lets Connect to Ukraine, Kyiv and get the views from there.

Lets Execute and see the Progress…..

After Execution, Views from Ukraine have been Increased.

Follow me for more Interesting post in future , Twitter – TheRamOraDBA & linkedin-ramoradba

Posted in Python, Selenium, Web-Traffic, WebScraping | Tagged: Python, Scrapping, Selenium, Web-Traffic | Leave a Comment »

Hacking – FATDBA.COM ¯\_(ツ)_/¯ `ॐ`

Posted by Sriram Sanka on October 4, 2022

#Python #WebScraping

Just Kidding !!! Its not Hacking, this is known as WEB-SCRAPING using the Powerful Python.

What is web scraping?

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database.

You just need a Browser, Simple & a small Python Code to get the content from the Web. First Lets see the Parts of the Code and Verify.

Step 1 : Install & Load the Python Modules

import time
import os
import pandas as pd
import requests
from lxml import etree
import random
import urllib.request, urllib.error, urllib.parse
import urllib.parse
import sys
import urllib.request
import string

Step 2: Define function to get the Name of the Site/Blog to Make it as a Folder.

def get_host(url,delim):
    parsed_url = urllib.parse.urlparse(url)
    return(parsed_url.netloc.replace(delim, "_"))

Step 3: Define a Function to Get the Blog/Page Title

def findTitle(url,delim):
    webpage = urllib.request.urlopen(url).read()
    title = str(webpage).split('<title>')[1].split('</title>')[0]
    return title.replace(delim, "_").translate(str.maketrans('', '', string.punctuation))

Step 4: Define a Function to Generate a Unique string of a given length

def unq_str(len):
    N = len
    res = ''.join(random.choices(string.ascii_uppercase + string.digits, k=N))
    return(str(res))

Step 5: Write the Main Block to Download the Content from the Site/Blog

def Download_blog(path,url,enc):
    try:   
        response = urllib.request.urlopen(url)
        webContent = response.read().decode(enc)
        os.makedirs(path+'\\'+ str(get_host(url,".")), exist_ok=True)
        n=os.path.join(path+'\\'+ str(get_host(url,".")),findTitle(url," ") +'.html')
        f = open(n, 'w',encoding=enc)
        f.write(webContent)
        f.close
    except:
        n1=os.path.join(path+'\\'+  str(get_host(url,"."))+'Download_Error.log')
        f1 = open(n1, 'w',encoding=enc) 
        f1.write(url)
        f1.close

Step 6: Define Another Function to save the Blog posts into a file & Invoke the Main block to get the Blog Content.

def write_post_url_to_file(blog,path):        
    main_sitemap = blog+'/sitemap.xml'
    r = requests.get(main_sitemap)
    root = etree.fromstring(r.content)
    for sitemap in root:
        children = sitemap.getchildren()
        with open(str(path+'\\'+get_host(blog,".")) +'_blog_links.txt', 'a') as f:
            f.write( f'\n{children[0].text}')
    col_name = ['url']
    df_url = pd.read_csv(str(path+'\\'+get_host(blog,".")) +'_blog_links.txt', names=col_name)
    for row in df_url.url:
        print(row)
        Download_blog(path,row,'UTF-8')
        
write_post_url_to_file("https://fatdba.com","c:\\Users\\Dell\\Downloads\\blogs\\")

This will create a file with links and folder with blog name to store all the content/Posts Data.

Sample Output as follows

BOOM !!!

For more interesting posts you can follow me @ Twitter – TheRamOraDBA & linkedin-ramoradba

Posted in download_blogs, Linux, Python, WebScraping, Windows | Tagged: blog, HTML, Python, web-scraping | Leave a Comment »

Python Basics – Part 1

Posted by Sriram Sanka on September 17, 2022

Language Introduction

Python is a dynamic, interpreted (bytecode-compiled) language. There are no type declarations of variables, parameters, functions, or methods in source code. This makes the code short and flexible, and you lose the compile-time type checking of the source code. Python tracks the types of all values at runtime and flags code that does not make sense as it runs.

https://www.edureka.co/blog/introduction-to-python/

In the Below Sections I have attached couple of reference Documents and Practice Notes for your reference. To obtain the contents, Rename the file Extension from txt to “ipynb” , which can be accessed using Jupyter Or Anaconda etc.

installingpython.1 Download

String Split

Description

Split the string input_str = ‘Kumar_Ravi_003’ to the person’s second name, first name and unique customer code. In this example, second_name= ‘Kumar’, first_name= ‘Ravi’, customer_code = ‘003’.

input_str = input('data')
first_name = input_str[6:10]
second_name = input_str[0:5]
customer_code = input_str[-3:]
print(first_name)
print(second_name)
print(customer_code)

string -lstrip()

input_str = input('Enter Input : ')
final_str = input_str.lstrip()
print(final_str)

introduction-datatypes Download

List is a collection which is ordered and changeable. Allows duplicate members.

Tuple is a collection which is ordered and unchangeable. Allows duplicate members.

Set is a collection which is unordered, unchangeable*, and unindexed. No duplicate members.

Dictionary is a collection which is ordered** and changeable. No duplicate members.

List to String

Description

Convert a list [‘Pythons syntax is easy to learn’, ‘Pythons syntax is very clear’] to a string using ‘&’. The sample output of this string will be:

Pythons syntax is easy to learn & Pythons syntax is very clear

Note that there is a space on both sides of ‘&’ (as usual in English sentences).

l =[]
l.append('Pythons syntax is easy to learn')
l.append(' Pythons syntax is very clear')
print('This is the List ',l)
input_str = l
string_1 = " & ".join(input_str)
print('This is Combined String ',string_1)

listcomprehensionsinpython Download

lists Download

tuples Download

dictionaries Download

sets Download

reference-workbook-2-data-structures Download

conditional-ifelse Download

looping-for-while Download

comprehensions Download

functions Download

map-reduce-filter Download

References

https://python-course.eu/advanced-python/lambda-filter-reduce-map.php

https://book.pythontips.com/en/latest/map_filter.html

https://python.swaroopch.com/functions.html

https://anh.cs.luc.edu/python/hands-on/3.1/handsonHtml/functions.html

https://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/

https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html

https://docs.python.org/3/tutorial/controlflow.html

https://docs.python.org/3/reference/compound_stmts.html

https://docs.python.org/3/tutorial/datastructures.html

https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/

https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html

https://python.swaroopch.com/

https://docs.python-guide.org/intro/learning/

https://www.simplilearn.com/tutorials/python-tutorial

https://developers.google.com/edu/python/lists

https://developers.google.com/edu/python/introduction

Posted in Anaconda, Python | Tagged: Anaconda, Basics, Python | Leave a Comment »

Install Python Modules using PIP & Upgrading Pip Version

Posted by Sriram Sanka on June 19, 2022

You can Install Python Modules by running Pip command as follows

python -m pip install matplotlib

python.exe -m pip install --upgrade pip This will upgrade the Pip Version to the Latest.

Posted in Python | Tagged: Matplotlib, PIP, Python | Leave a Comment »

« Previous Entries

Sriram Sanka – My Experiences with Databases & More

Oracle-MySQL-SQL SERVER-Python-Azure-AWS-Oracle Cloud-GCP etc

Moderator at

Member at

Subscribe

Categories

Archives

Visits n Views

Blogs I Follow

Follow Blog via Email

Total Views

$riram $anka

Archive for the ‘Python’ Category

Download Youtube ( Channel ) videos using Python Module – Pytube ,scrapetube.

Reading Chrome Bookmarks using Python Module chrome_bookmarks

Sample Script to Publish a blog Post #Python

Python way to Download all the ASKTOM and Oracle MAG Posted by Connor McDonald at Linked In Group

Web-Scraping 🐍 – Part 2 – Download scripts from code.activestate.com with Python -Pagination

How-to-Install-Python-with-Anaconda & Connect with Oracle

Fun with Python – Create Web Traffic using selenium & Python.

Hacking – FATDBA.COM ¯\_(ツ)_/¯ `ॐ`

Python Basics – Part 1

Language Introduction

String Split

List to String

Install Python Modules using PIP & Upgrading Pip Version

Oracle-MySQL-SQL SERVER-Python-Azure-AWS-Oracle Cloud-GCP etc

Moderator at

Member at

Subscribe

Categories

Archives

Visits n Views

Blogs I Follow

Follow Blog via Email

Total Views

$riram $anka

Archive for the ‘Python’ Category

Share this

Share this

Share this

Share this

Share this

Share this

Share this

Share this

Language Introduction

String Split

List to String

Share this

Share this