Arthur Camberlein >> SEO & Data articles >> Unique URLs on a dataframe column in Python

Unique URLs on a dataframe column in Python

Written by Arthur Camberlein | Published on & updated on

Might be a simple trick, but wanted to share something I am using on a regular basis: unique URL per column on a dataframe and how to export it!

I will use pandas so start by importing the libraryimport pandas as pd

I am labelling this as Python SEO as I am using it most of the time to extract 404s, create redirects, avoid redirect loops on files, ...

Load you dataframe from the csv

In this case I am using a csv: df = pd.read_csv('pages.csv') and loading the data.

unique_urls = df['Source url'].unique()

Using .unique() will save you some time: unique_urls = df['Source url'].unique() where the column name is Source url

Exporting the unique URLs to a text file

I didn't find a simpler solution ...

with open(output_filepath, 'w') as file:
    for url in unique_urls:
        file.write(url + '\n')

Bonus point: tell you how-many URLs

Using a print function: print(f"Unique URLs have been written to {output_filepath}")

Finally, the whole script!

import pandas as pd

# Load the DataFrame from a CSV file
df = pd.read_csv('pages.csv')

# Extract unique URLs from the 'Source url' column
unique_urls = df['Source url'].unique()

# Path for the output text file
output_filepath = 'unique_urls.txt'

# Exporting the unique URLs to a text file, each URL on a new line
with open(output_filepath, 'w') as file:
    for url in unique_urls:
        file.write(url + '\n')

print(f"{len(unique_urls)} Unique URLs have been written to {output_filepath}")
Back to blog

Learn more with the article FAQ

Unique URLs on a dataframe column in Python - FAQs

Blog post taggued in: Python, Python SEO

Written by