Filter a file when a column matches values in other file
Suppose I have two files, File A:
a,abcdef
b,bcdefa
c,cdefab
a,defabc
b,efabcd
c,fabcde
And File B:
a
b
The output I'm looking for is:
a,abcdef
b,bcdefa
a,defabc
b,efabcd
So, basically, I want to select the rows from File A where the first column matches any value in File B using standard unix commands. A kind of awk {if (file_b contains $1} print $1,$2, but more efficient.
Expected number of rows in File A exceeds 20 million, and File B 1 million. It must run in O(n), so the contains step should probably rely on an Hash Table.
Suppose I have two files, File A:
a,abcdef
b,bcdefa
c,cdefab
a,defabc
b,efabcd
c,fabcde
And File B:
a
b
The output I'm looking for is:
a,abcdef
b,bcdefa
a,defabc
b,efabcd
So, basically, I want to select the rows from File A where the first column matches any value in File B using standard unix commands. A kind of awk {if (file_b contains $1} print $1,$2, but more efficient.
Expected number of rows in File A exceeds 20 million, and File B 1 million. It must run in O(n), so the contains step should probably rely on an Hash Table.
No comments:
Post a Comment