Might be a simple trick, but wanted to share something I am using on a regular basis: unique URL per column on a dataframe and how to export it!
I will use pandas so start by importing the libraryimport pandas as pd
I am labelling this as Python SEO as I am using it most of the time to extract 404s, create redirects, avoid redirect loops on files, ...
Load you dataframe from the csv
In this case I am using a csv: df = pd.read_csv('pages.csv')
and loading the data.
unique_urls = df['Source url'].unique()
Using .unique()
will save you some time: unique_urls = df['Source url'].unique()
where the column name is Source url
Exporting the unique URLs to a text file
I didn't find a simpler solution ...
with open(output_filepath, 'w') as file:
for url in unique_urls:
file.write(url + '\n')
Bonus point: tell you how-many URLs
Using a print function: print(f"Unique URLs have been written to {output_filepath}")
Finally, the whole script!
import pandas as pd
# Load the DataFrame from a CSV file
df = pd.read_csv('pages.csv')
# Extract unique URLs from the 'Source url' column
unique_urls = df['Source url'].unique()
# Path for the output text file
output_filepath = 'unique_urls.txt'
# Exporting the unique URLs to a text file, each URL on a new line
with open(output_filepath, 'w') as file:
for url in unique_urls:
file.write(url + '\n')
print(f"{len(unique_urls)} Unique URLs have been written to {output_filepath}")