Buzzword or reality? Data is all around us and around your corner!

Are .parquet files better than .csv files?

Are .parquet files better than .csv files?

Whether Parquet files are better than CSV files depends on the specific use case and requirements. And also to who you will send the file! When may .csv files be better than .parquet files? CSV files are a simple and widely used file format for storing tabular data. They are...

Difference between python --version and python -v

Difference between python --version and python -v

Difference between python --version and python -v. These two commands have completely different purposes: python --version Shows the Python interpreter's version number Example output: Python 3.9.7 This is the standard way to check which version of Python you're running It's equivalent to python -V (capital V) python -v Activates verbose...

How to change tag style in bulk? For an experiment

How to change tag style in bulk? For an experiment

Not so long ago, I faced an issue while wanted to test a colleague idea: how can we test on which links user are more willing to click. TL;DR If my article is too long, these are the steps: Define the scope Extract or get the HTML content Split your group...

How to read a .parquet file in R?

How to read a .parquet file in R?

To read a .parquet file with R, you can use the arrow package, which provides a way to read and write data in the Arrow format, including Parquet. For you, I created an example below: library(arrow) #read the Parquet file into a data.frame data <- read_parquet("example.parquet") #display the data.frame print(data)...

Remove X characters from a cell in Excel

Remove X characters from a cell in Excel

Sometimes, when you're reading data in a spreadsheet like Microsoft's Excel, it's simpler to put a formula directly into the spreadsheet than to go to R, Python or any other solution.

How to save a .parquet file in Python?

How to save a .parquet file in Python?

To save a .parquet file with Python, you can use the pandas library, which provides a convenient way to read and write data in a variety of formats, including Parquet. Let me share with you an example: ```Python import pandas as pd create a sample DataFrame data = {'name': ['Alice',...

What is a .parquet file?

What is a .parquet file?

div> I was recently introduced to .parquet extension files. Let me share with you what I learned! Parquet files are popular for storing and processing large datasets in big data environments. Introduction: a few words about the .parquet Parquet file is a columnar storage file format that is commonly used...

When to use `.parquet` and when to use `.csv`?

When to use `.parquet` and when to use `.csv`?

Following my article about parquet files, I thought I could give you some examples of when you might want to prefer .csv files and when you might want to use .parquet. When could we use CSV files? Storing small to medium-sized datasets that can be easily opened and read in...

How to read a `.parquet` file in Python?

How to read a `.parquet` file in Python?

To read a .parquet file with Python, the pandas library is your friend. In fact, pandas provides a convenient way to read and write data in a variety of formats (you might be familiar with CSV or XLS[X] files), including Parquet.

How to import Python libraries

How to import Python libraries

This import will work if you are using any version of Python (meaning Python 2 or Python 3). How to import a library To import a library, you will have to use import + {the name of your library}. So you could do this to import libraries one by one:...

How to upload files in Google Colab

Importer des fichiers dans Google Colab

Si vous aimez les Notebook (comme Google Colab ou Jupyter Notebook), vous allez aimer cette astuce pour importer un ou plusieurs fichiers grâce à une fonction. Une fonction, qui crée un bon moyen d'importer les fichiers Vous trouverez ci-dessous la fameuse fonction et ensuite, vous n'aurez plus qu'a cliquer sur...

Know and display all column from a dataframe in Python

Connaître (et afficher) toutes les colonnes d'un dataframe en Python

Connaître (et afficher) toutes les colonnes d'un dataframe en Python Pré-requis Nous allons une nouvelle fois utiliser la librairie (non moins fameuse) pandas utilisée en Python. Pour l'installation vous avez les solutions ci-dessous: pour Python pip install pandas pour Python 3 pip3 install pandas Vous n'aurez qu'à installer une seule...

Diff date usage in Python

Diff date usage in Python

import difflib import pandas as pd from datetime import date date = date.today() today = date.strftime("%Y-%m-%d") document = today + "-diff" document_txt = "data/" + document + ".txt" document_csv = "data/" + document + ".csv" with open('robots-live.txt') as robots_live, open('robots-staging2.txt') as robots_staging: diff = difflib.unified_diff( robots_live.readlines(), robots_staging.readlines(), fromfile='robots-live.txt', tofile='robots-staging.txt', )...

Skipping last column in R while `read.csv`

Ne pas prendre en compte la dernière colonne `read.csv` avec R

Ignorer la dernière colonne dans R lors de « read.csv » doit être effectué en deux étapes données <- read.csv("test12.csv") données données[,-ncol(données)] C'est le cas lorsque vous ne connaissez pas le nombre de colonnes. Si vous le connaissez explicitement, utilisez plutôt le code ci-dessous : df <- read.csv("test12.csv")[,-3] Cette solution provient d'un article...