It depends on your data and on the library or functions that you're using to analyze your data.
The main difference between matrices and data frames in R has to do with the data types of the elements of the data structure you need to use. Matrices require the elements to all be of the same data type, whereas data frames can have columns of different data types. In other words, a matrix must contain just numeric values, or just factors, etc. Dataframes are more like spreadsheets, so you can have numeric data in column A, factors in column B, characters in column C, and so on.
Another consideration is computational efficiency and compatibility. Your computer uses matrices under the hood, because they are faster typically. So if you're analyzing a massive data set with millions of rows and columns or if you're repeating the same computationally intensive operation many times in a loop, perhaps matrices are a better choice, but it depends on what tools you're using. The tidyverse library, which contains ggplot2, dplyr, and some other useful data science tools is built around data frames. Functions from other libraries require the inputs to be matrices.
Ultimately, in many situations, you can use either. In this example, I'm simulating n=100 observations from a regression model with an intercept of 2 and a slope of -3.
Here, I'm storing the simulated data into a dataframe df and into the matrices Y and X.
I'm using the dataframe with the function lm to estimate the regression coefficients.
From linear algebra, we can also estimate the regression coefficients using matrices and the following formula:
β=(XTX)-1XTY
I'm using this formula in this code chunk. Note that "%*%" is matrix multiplication, which is not the same as regular multiplication (also known as scalar multiplication).
If you run this code sample in R, you should get the same estimates. The matrix approach is faster but requires specialized knowledge, and for this simple example with 100 observations, you won't notice a difference in speed.