5 個用於自動執行 SEO 任務的 Python 腳本

已發表: 2023-04-12

Python 是一種功能強大的編程語言，在過去幾年中在 SEO 行業中廣受歡迎。

憑藉其相對簡單的語法、高效的性能以及豐富的庫和框架，Python 徹底改變了許多 SEO 的工作方式。

Python 提供了一個多功能的工具集，可以幫助使優化過程更快、更準確和更有效。

本文探討了五個 Python 腳本，以幫助提高您的 SEO 效果。

自動化重定向映射。
批量編寫元描述。
使用 N-gram 分析關鍵字。
將關鍵字分組到主題集群中。
將關鍵字列表與預定義主題列表相匹配。

開始使用 Python 的最簡單方法

如果您想涉足 Python 編程，Google Colab 值得考慮。

它是一個免費的、基於 Web 的平台，為編寫和運行 Python 代碼提供了一個方便的平台，無需複雜的本地設置。

從本質上講，它允許您在瀏覽器中訪問 Jupyter Notebooks，並提供大量用於數據科學和機器學習的預安裝庫。

此外，它建立在 Google 雲端硬盤之上，因此您可以輕鬆保存您的工作並與他人共享。

要開始，請按照下列步驟操作：

啟用文件上傳

打開 Google Colab 後，您首先需要啟用創建臨時文件存儲庫的功能。就像單擊文件夾圖標一樣簡單。

這使您可以上傳臨時文件，然後下載任何結果文件。

上傳源數據

我們的許多 Python 腳本都需要源文件才能運行。要上傳文件，只需單擊上傳按鈕。

完成設置後，您可以開始測試以下 Python 腳本。

腳本 1：自動化重定向映射

為大型站點創建重定向映射可能非常耗時。尋找使流程自動化的方法可以幫助我們節省時間並專注於其他任務。

這個腳本是如何工作的

此腳本側重於分析 Web 內容以找到緊密匹配的文章。

首先，它導入兩個TXT文件的URLs：一個是重定向網站的（source_urls.txt），另一個是吸收重定向網站的站點（target_urls.txt）。
然後，我們使用 Python 庫 Beautiful Soup 創建一個網絡爬蟲來獲取頁面上的主體內容。此腳本忽略頁眉和頁腳內容。
在爬取所有頁面的內容後，它使用 Python 庫 Polyfuzz 以相似度百分比匹配 URL 之間的內容。
最後，它將結果打印在 CSV 文件中，包括相似度百分比。

從這裡，您可以手動查看任何相似性百分比較低的 URL 以找到下一個最接近的匹配項。

獲取腳本

#import libraries from bs4 import BeautifulSoup, SoupStrainer from polyfuzz import PolyFuzz import concurrent.futures import csv import pandas as pd import requests #import urls with open("source_urls.txt", "r") as file: url_list_a = [line.strip() for line in file] with open("target_urls.txt", "r") as file: url_list_b = [line.strip() for line in file] #create a content scraper via bs4 def get_content(url_argument): page_source = requests.get(url_argument).text strainer = SoupStrainer('p') soup = BeautifulSoup(page_source, 'lxml', parse_only=strainer) paragraph_list = [element.text for element in soup.find_all(strainer)] content = " ".join(paragraph_list) return content #scrape the urls for content with concurrent.futures.ThreadPoolExecutor() as executor: content_list_a = list(executor.map(get_content, url_list_a)) content_list_b = list(executor.map(get_content, url_list_b)) content_dictionary = dict(zip(url_list_b, content_list_b)) #get content similarities via polyfuzz model = PolyFuzz("TF-IDF") model.match(content_list_a, content_list_b) data = model.get_matches() #map similarity data back to urls def get_key(argument): for key, value in content_dictionary.items(): if argument == value: return key return key with concurrent.futures.ThreadPoolExecutor() as executor: result = list(executor.map(get_key, data["To"])) #create a dataframe for the final results to_zip = list(zip(url_list_a, result, data["Similarity"])) df = pd.DataFrame(to_zip) df.columns = ["From URL", "To URL", "% Identical"] #export to a spreadsheet with open("redirect_map.csv", "w", newline="") as file: columns = ["From URL", "To URL", "% Identical"] writer = csv.writer(file) writer.writerow(columns) for row in to_zip: writer.writerow(row)

腳本 2：批量編寫元描述

雖然元描述不是直接的排名因素，但它們可以幫助我們提高有機點擊率。將元描述留空會增加 Google 創建自己的元描述的機會。

如果您的 SEO 審核顯示大量 URL 缺少元描述，則可能很難抽出時間手工編寫所有這些內容，尤其是對於電子商務網站。

該腳本旨在通過為您自動執行該過程來幫助您節省時間。

腳本是如何工作的

首先，該腳本從 TXT 文件 (urls.txt) 導入 URL 列表。
然後，它解析 URL 上的所有內容。
解析內容後，它會創建旨在少於 155 個字符的元描述。
它將結果導出到 CSV 文件中。

獲取腳本

!pip install sumy from sumy.parsers.html import HtmlParser from sumy.nlp.tokenizers import Tokenizer from sumy.nlp.stemmers import Stemmer from sumy.utils import get_stop_words from sumy.summarizers.lsa import LsaSummarizer import csv #1) imports a list of URLs from a txt file with open('urls.txt') as f: urls = [line.strip() for line in f] results = [] # 2) analyzes the content on each URL for url in urls: parser = HtmlParser.from_url(url, Tokenizer("english")) stemmer = Stemmer("english") summarizer = LsaSummarizer(stemmer) summarizer.stop_words = get_stop_words("english") description = summarizer(parser.document, 3) description = " ".join([sentence._text for sentence in description]) if len(description) > 155: description = description[:152] + '...' results.append({ 'url': url, 'description': description }) # 4) exports the results to a csv file with open('results.csv', 'w', newline='') as f: writer = csv.DictWriter(f, fieldnames=['url','description']) writer.writeheader() writer.writerows(results)

腳本 3：使用 N-gram 分析關鍵字

N-gram 不是一個新概念，但對 SEO 仍然有用。它們可以幫助我們理解大量關鍵詞數據的主題。

這個腳本是如何工作的

此腳本將結果輸出到一個 TXT 文件中，該文件將關鍵字分解為一元組、二元組和三元組。

首先，它會導入包含所有關鍵字的 TXT 文件 (keyword.txt)。
然後它使用一個名為 Counter 的 Python 庫來分析和提取 N-gram。
然後它將結果導出到一個新的 TXT 文件中。

獲取此腳本

#Import necessary libraries import re from collections import Counter #Open the text file and read its contents into a list of words with open('keywords.txt', 'r') as f: words = f.read().split() #Use a regular expression to remove any non-alphabetic characters from the words words = [re.sub(r'[^a-zA-Z]', '', word) for word in words] #Initialize empty dictionaries for storing the unigrams, bigrams, and trigrams unigrams = {} bigrams = {} trigrams = {} #Iterate through the list of words and count the number of occurrences of each unigram, bigram, and trigram for i in range(len(words)): # Unigrams if words[i] in unigrams: unigrams[words[i]] += 1 else: unigrams[words[i]] = 1 # Bigrams if i < len(words)-1: bigram = words[i] + ' ' + words[i+1] if bigram in bigrams: bigrams[bigram] += 1 else: bigrams[bigram] = 1 # Trigrams if i < len(words)-2: trigram = words[i] + ' ' + words[i+1] + ' ' + words[i+2] if trigram in trigrams: trigrams[trigram] += 1 else: trigrams[trigram] = 1 # Sort the dictionaries by the number of occurrences sorted_unigrams = sorted(unigrams.items(), key=lambda x: x[1], reverse=True) sorted_bigrams = sorted(bigrams.items(), key=lambda x: x[1], reverse=True) sorted_trigrams = sorted(trigrams.items(), key=lambda x: x[1], reverse=True) # Write the results to a text file with open('results.txt', 'w') as f: f.write("Most common unigrams:\n") for unigram, count in sorted_unigrams[:10]: f.write(unigram + ": " + str(count) + "\n") f.write("\nMost common bigrams:\n") for bigram, count in sorted_bigrams[:10]: f.write(bigram + ": " + str(count) + "\n") f.write("\nMost common trigrams:\n") for trigram, count in sorted_trigrams[:10]: f.write(trigram + ": " + str(count) + "\n")

腳本 4：將關鍵字分組到主題集群中

對於新的 SEO 項目，關鍵字研究始終處於早期階段。有時我們在一個數據集中處理數千個關鍵字，這使得分組具有挑戰性。

Python 允許我們自動將關鍵字聚類到相似的組中，以識別趨勢趨勢並完成我們的關鍵字映射。

這個腳本是如何工作的

此腳本首先導入關鍵字的 TXT 文件 (keywords.txt)。
然後腳本使用 TfidfVectorizer 和 AffinityPropagation 分析關鍵字。
然後它為每個主題集群分配一個數值。
然後將結果導出到 csv 文件中。

獲取此腳本

import csv import numpy as np from sklearn.cluster import AffinityPropagation from sklearn.feature_extraction.text import TfidfVectorizer # Read keywords from text file with open("keywords.txt", "r") as f: keywords = f.read().splitlines() # Create a Tf-idf representation of the keywords vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(keywords) # Perform Affinity Propagation clustering af = AffinityPropagation().fit(X) cluster_centers_indices = af.cluster_centers_indices_ labels = af.labels_ # Get the number of clusters found n_clusters = len(cluster_centers_indices) # Write the clusters to a csv file with open("clusters.csv", "w", newline="") as f: writer = csv.writer(f) writer.writerow(["Cluster", "Keyword"]) for i in range(n_clusters): cluster_keywords = [keywords[j] for j in range(len(labels)) if labels[j] == i] if cluster_keywords: for keyword in cluster_keywords: writer.writerow([i, keyword]) else: writer.writerow([i, ""])

腳本 5：將關鍵字列表與預定義主題列表匹配

這類似於前面的腳本，不同之處在於它允許您將關鍵字列表與一組預定義的主題相匹配。

這對於大量關鍵字非常有用，因為它以 1,000 個為一組處理它們以防止系統崩潰。

這個腳本是如何工作的

此腳本導入關鍵字列表 (keywords.txt) 和主題列表 (topics.txt)。
然後它分析主題和關鍵字列表並將它們與最接近的匹配項進行匹配。如果找不到匹配項，則會將其歸類為其他。
然後將結果導出到 CSV 文件中。

獲取此腳本

import pandas as pd import spacy from spacy.lang.en.stop_words import STOP_WORDS # Load the Spacy English language model nlp = spacy.load("en_core_web_sm") # Define the batch size for keyword analysis BATCH_SIZE = 1000 # Load the keywords and topics files as Pandas dataframes keywords_df = pd.read_csv("keywords.txt", header=None, names=["keyword"]) topics_df = pd.read_csv("topics.txt", header=None, names=["topic"]) # Define a function to categorize a keyword based on the closest related topic def categorize_keyword(keyword): # Tokenize the keyword tokens = nlp(keyword.lower()) # Remove stop words and punctuation tokens = [token.text for token in tokens if not token.is_stop and not token.is_punct] # Find the topic that has the most token overlaps with the keyword max_overlap = 0 best_topic = "Other" for topic in topics_df["topic"]: topic_tokens = nlp(topic.lower()) topic_tokens = [token.text for token in topic_tokens if not token.is_stop and not token.is_punct] overlap = len(set(tokens).intersection(set(topic_tokens))) if overlap > max_overlap: max_overlap = overlap best_topic = topic return best_topic # Define a function to process a batch of keywords and return the results as a dataframe def process_keyword_batch(keyword_batch): results = [] for keyword in keyword_batch: category = categorize_keyword(keyword) results.append({"keyword": keyword, "category": category}) return pd.DataFrame(results) # Initialize an empty dataframe to hold the results results_df = pd.DataFrame(columns=["keyword", "category"]) # Process the keywords in batches for i in range(0, len(keywords_df), BATCH_SIZE): keyword_batch = keywords_df.iloc[i:i+BATCH_SIZE]["keyword"].tolist() batch_results_df = process_keyword_batch(keyword_batch) results_df = pd.concat([results_df, batch_results_df]) # Export the results to a CSV file results_df.to_csv("results.csv", index=False)

使用 Python 進行 SEO

對於 SEO 專業人員來說，Python 是一種非常強大且用途廣泛的工具。

無論您是初學者還是經驗豐富的從業者，我在本文中分享的免費腳本都為探索 Python 在 SEO 中的可能性提供了一個很好的起點。

憑藉其直觀的語法和大量的庫，Python 可以幫助您自動執行繁瑣的任務、分析複雜的數據並獲得對網站性能的新見解。那麼為什麼不試一試呢？

祝你好運，編碼愉快！

本文中表達的觀點是客座作者的觀點，不一定是 Search Engine Land。 此處列出了工作人員作者。

將 Search Engine Land 添加到您的 Google 新聞提要中。