close
close

Dataframes explained: A modern in-memory data science format

Dataframes explained: A modern in-memory data science format


import pandas as pd
data = {
    "Title": ("Blade Runner", "2001: a space odyssey", "Alien"),
    "Year": (1982, 1968, 1979),
    "MPA Rating": ("R","G","R")
}
df = pd.DataFrame(data)

Applications that use data frames

As I mentioned earlier, most data science libraries or frameworks support some type of dataframe-like structure. R language He is generally credited with popularizing the concept of a data frame (although it existed in other forms before that). SparkOne of the first widely popular platforms for processing data at scale, it has its own data framework system. pandas Data library for Python and its speed-optimized cousin polesboth offer data frames. And analytical database DuckDB It combines the conveniences of data frames with the power of a full-blown database system.

It is worth noting that the application in question may support data frame data formats specific to that application. For example, Pandas provides data types for: sparse data structures in a data frame. In contrast, Spark does not have an explicit sparse data type, so any sparse formatted data requires an additional conversion step to be used in the Spark dataframe.

For this purpose, there is no definitive version of a dataframe, although some libraries with dataframes are more popular. They are one concept It is carried out by many different applications. Each implementation of a dataframe is free to do different things under the hood, and some dataframe implementations differ in end-user details as well.