
William L. answered 05/18/21
Data Scientist and Evaluation Specialist
Before I answer your question, I'm going to start by warning that using SOCIOECONOMIC status (SES) as your column for matching people runs a high risk of incorrect matches as the size of your data increases. The reason for this is that no factor excludes two individuals from reporting the same SOCIOECONOMIC score.
That said, if you're intend on combining based on this term, what you need is an INNER JOIN. This is coded differently depending on which language you are using, but the basic function creates a new dataset by taking your first dataset and looking for where it has matches in your specified id column (SES) in the second dataset. An inner join will only include rows which match in both of your datasets. If you want to make sure that you don't lose rows that have no match between your datasets use a FULL JOIN. I won't go into FULL JOINs here as it sounds like you're using matched data.
Here are some examples of how to code this in different languages.
SQL:
SELECT *
FROM <insert the name of your first dataset here>
INNER JOIN <insert the name of your second dataset here>
ON <dataset1>.<matching column> = <dataset2>.<matching column>;
Example: SELECT * FROM Table1 INNER JOIN Table2 ON Table1.Col1 = Table2.Col2;
R: I recommend using dplyr because it's a streamlined way to manipulate data
library(dplyr)
<new dataset name> <- <name of first dataset> %>%
inner_join(<name of second dataset>, by = c("<column from first dataset>" = "<column from second
dataset>"))
Example: Table 3 <- Table 1 %>% inner_join( Table 2, by = c("Col1" = "Col2"))
Python:
import pandas as pd
pd.merge(<first dataset name>, <second dataset name>, left_on='<first datasets matching column>',
right_on='<second dataset's matching column>')
Example: pd.merge(Table1, Table2, left_on='Col1', right_on='Col2')