Might be a simple trick, but wanted to share something I am using on a regular basis: unique URL per column on a dataframe and how to export it!
I will use pandas so start by importing the libraryimport pandas as pd
I am labelling this as Python SEO as I am using it most of the time to extract 404s, create redirects, avoid redirect loops on files, ...
Load you dataframe from the csv
In this case I am using a csv: df = pd.read_csv('pages.csv')
and loading the data.
unique_urls = df['Source url'].unique()
Using .unique()
will save you some time: unique_urls = df['Source url'].unique()
where the column name is Source url
Exporting the unique URLs to a text file
I didn't find a simpler solution ...
with open(output_filepath, 'w') as file: for url in unique_urls: file.write(url + '\n')
Bonus point: tell you how-many URLs
Using a print function: print(f"Unique URLs have been written to {output_filepath}")
Finally, the whole script!
import pandas as pd # Load the DataFrame from a CSV file df = pd.read_csv('pages.csv') # Extract unique URLs from the 'Source url' column unique_urls = df['Source url'].unique() # Path for the output text file output_filepath = 'unique_urls.txt' # Exporting the unique URLs to a text file, each URL on a new line with open(output_filepath, 'w') as file: for url in unique_urls: file.write(url + '\n') print(f"{len(unique_urls)} Unique URLs have been written to {output_filepath}")