2023-12-02 02:18:00 +01:00
# [glob](https://docs.python.org/3/library/glob.html) : Finding files in a directory
2023-12-02 20:26:06 +01:00
{:.no_toc}
* TOC
{:toc}
2023-12-02 02:07:01 +01:00
## Goal
We want to deal with many files in a directory. What is an easy way to get the filename in a directory?
Questions to [David Rotermund ](mailto:davrot@uni-bremen.de )
## Creating test files
```python
from pathlib import Path
2023-12-02 02:13:20 +01:00
Path("Testfile_1.mat").touch()
Path("Testfile_2.mat").touch()
Path("Testfile_10.mat").touch()
Path("Testfile_3.mat").touch()
2023-12-02 02:07:01 +01:00
```
2023-12-02 02:09:42 +01:00
## Using glob in a for-loop
```python
import glob
for filename in glob.glob("*.mat"):
print(filename)
```
2023-12-02 02:36:46 +01:00
```python
2023-12-02 02:13:20 +01:00
Testfile_1.mat
Testfile_2.mat
Testfile_10.mat
Testfile_3.mat
2023-12-02 02:09:42 +01:00
```
## Using glob to create a list
```python
import glob
list = glob.glob("*.mat")
print(list)
```
2023-12-02 02:35:00 +01:00
```python
2023-12-02 02:13:20 +01:00
['Testfile_1.mat', 'Testfile_2.mat', 'Testfile_10.mat', 'Testfile_3.mat']
2023-12-02 02:09:42 +01:00
```
2023-12-02 02:11:20 +01:00
### Sorting the filenames
```python
import glob
list = sorted(glob.glob("*.mat"))
print(list)
```
2023-12-02 02:35:00 +01:00
```python
2023-12-02 02:13:20 +01:00
['Testfile_1.mat', 'Testfile_10.mat', 'Testfile_2.mat', 'Testfile_3.mat']
```
2023-12-02 02:18:00 +01:00
Hmmm... This result is not helpful.
2023-12-02 02:11:20 +01:00
2023-12-02 02:18:00 +01:00
### Sorting the filenames with [natsort](https://pypi.org/project/natsort/)
2023-12-02 02:11:20 +01:00
2023-12-02 02:18:00 +01:00
```shell
pip install natsort
```
```python
import glob
from natsort import natsorted
2023-12-02 02:09:42 +01:00
2023-12-02 02:18:00 +01:00
list = natsorted(glob.glob("*.mat"))
print(list)
```
2023-12-02 02:09:42 +01:00
2023-12-02 02:35:00 +01:00
```python
2023-12-02 02:18:00 +01:00
['Testfile_1.mat', 'Testfile_2.mat', 'Testfile_3.mat', 'Testfile_10.mat']
```
2023-12-02 02:09:42 +01:00
2023-12-02 02:22:18 +01:00
## rsplit
And maybe you don't want to have the file extensions. Then we can use [rsplit ](https://docs.python.org/3/library/stdtypes.html#str.rsplit ) on the string.
```python
import glob
from natsort import natsorted
for filename in natsorted(glob.glob("*.mat")):
print(filename.rsplit(".", 1)[0])
```
2023-12-02 02:35:00 +01:00
```python
2023-12-02 02:22:18 +01:00
Testfile_1
Testfile_2
Testfile_3
Testfile_10
```
2023-12-02 02:28:44 +01:00
Alternatively without a for-loop but using [map ](https://docs.python.org/3/library/functions.html#map ) , [list ](https://docs.python.org/3/library/functions.html#func-list ) and [lambda functions ](https://docs.python.org/3/reference/expressions.html#lambda ):
```python
import glob
from natsort import natsorted
filenames = natsorted(glob.glob("*.mat"))
filenames = list(map(lambda s: s.rsplit(".", 1)[0], filenames))
print(filenames)
```
2023-12-02 02:35:00 +01:00
```python
2023-12-02 02:28:44 +01:00
['Testfile_1', 'Testfile_2', 'Testfile_3', 'Testfile_10']
```